0 / 0
String functions
Last updated: Dec 20, 2024
String functions (SPSS Modeler)

With CLEM, you can run operations to compare strings, create strings, or access characters.

In CLEM, a string is any sequence of characters between matching double quotation marks ("string quotes"). Characters (CHAR) can be any single alphanumeric character. They're declared in CLEM expressions using single back quotes in the form of `<character>`, such as `z`, `A`, or `2`. Characters that are out-of-bounds or negative indices to a string will result in undefined behavior.

Note: Comparisons between strings that do and do not use SQL pushback may generate different results where trailing spaces exist.
Table 1. CLEM string functions
Function Result Description
allbutfirst(N, STRING) String Returns a string, which is STRING with the first N characters removed.
allbutlast(N, STRING) String Returns a string, which is STRING with the last characters removed.
alphabefore(STRING1, STRING2) Boolean Used to check the alphabetical ordering of strings. Returns true if STRING1 precedes STRING2.
count_substring(STRING, SUBSTRING) Integer Returns the number of times the specified substring occurs within the string. For example, count_substring("foooo.txt", "oo") returns 3.
endstring(LENGTH, STRING) String Extracts the last N characters from the specified string. If the string length is less than or equal to the specified length, then it is unchanged.
hasendstring(STRING, SUBSTRING) Integer This function is the same as isendstring(SUBSTRING, STRING).
hasmidstring(STRING, SUBSTRING) Integer This function is the same as ismidstring(SUBSTRING, STRING) (embedded substring).
hasstartstring(STRING, SUBSTRING) Integer This function is the same as isstartstring(SUBSTRING, STRING).
hassubstring(STRING, N, SUBSTRING) Integer This function is the same as issubstring(SUBSTRING, N, STRING), where N defaults to 1.
hassubstring(STRING, SUBSTRING) Integer This function is the same as issubstring(SUBSTRING, 1, STRING), where N defaults to 1.
isalphacode(CHAR) Boolean Returns a value of true if CHAR is a character in the specified string (often a field name) whose character code is a letter. Otherwise, this function returns a value of 0. For example, isalphacode(produce_num(1)).
isendstring(SUBSTRING, STRING) Integer If the string STRING ends with the substring SUBSTRING, then this function returns the integer subscript of SUBSTRING in STRING. Otherwise, this function returns a value of 0.
islowercode(CHAR) Boolean Returns a value of true if CHAR is a lowercase letter character for the specified string (often a field name). Otherwise, this function returns a value of 0. For example, both islowercode(``) and islowercode(country_name(2)) are valid expressions.
ismidstring(SUBSTRING, STRING) Integer If SUBSTRING is a substring of STRING but does not start on the first character of STRING or end on the last, then this function returns the subscript at which the substring starts. Otherwise, this function returns a value of 0.
isnumbercode(CHAR) Boolean Returns a value of true if CHAR for the specified string (often a field name) is a character whose character code is a digit. Otherwise, this function returns a value of 0. For example, isnumbercode(product_id(2)).
isstartstring(SUBSTRING, STRING) Integer If the string STRING starts with the substring SUBSTRING, then this function returns the subscript 1. Otherwise, this function returns a value of 0.
issubstring(SUBSTRING, N, STRING) Integer Searches the string STRING, starting from its Nth character, for a substring equal to the string SUBSTRING. If found, this function returns the integer subscript at which the matching substring begins. Otherwise, this function returns a value of 0. If N is not given, this function defaults to 1.
issubstring(SUBSTRING, STRING) Integer Searches the string STRING. If found, this function returns the integer subscript at which the matching substring begins. Otherwise, this function returns a value of 0.
issubstring_count(SUBSTRING, N, STRING) Integer Returns the index of the Nth occurrence of SUBSTRING within the specified STRING. If there are fewer than N occurrences of SUBSTRING, 0 is returned.
issubstring_lim(SUBSTRING, N, STARTLIM, ENDLIM, STRING) Integer This function is the same as issubstring, but the match is constrained to start on STARTLIM and to end on ENDLIM. The STARTLIM or ENDLIM constraints may be disabled by supplying a value of false for either argument—for example, issubstring_lim(SUBSTRING, N, false, false, STRING) is the same as issubstring.
isuppercode(CHAR) Boolean Returns a value of true if CHAR is an uppercase letter character. Otherwise, this function returns a value of 0. For example, both isuppercode(``) and isuppercode(country_name(2)) are valid expressions.
last(STRING) String Returns the last character CHAR of STRING (which must be at least one character long).
length(STRING) Integer Returns the length of the string STRING (that is, the number of characters in it).
locchar(CHAR, N, STRING) Integer Used to identify the location of characters in symbolic fields. The function searches the string STRING for the character CHAR, starting the search at the Nth character of STRING. This function returns a value indicating the location (starting at N) where the character is found. If the character is not found, this function returns a value of 0. If the function has an invalid offset (N) (for example, an offset that is beyond the length of the string), this function returns $null$.
For example, locchar(`n`, 2, web_page) searches the field called web_page for the `n` character beginning at the second character in the field value.
Be sure to use single back quotes to encapsulate the specified character.
locchar_back(CHAR, N, STRING) Integer Similar to locchar, except that the search is performed backward starting from the Nth character. For example, locchar_back(`n`, 9, web_page) searches the field web_page starting from the ninth character and moving backward toward the start of the string. If the function has an invalid offset (for example, an offset that is beyond the length of the string), this function returns $null$. Ideally, you should use locchar_back in conjunction with the function length(<field>) to dynamically use the length of the current value of the field. For example, locchar_back(`n`, (length(web_page)), web_page).
lowertoupper(CHAR) lowertoupper (STRING) CHAR or String Input can be either a string or character, which is used in this function to return a new item of the same type, with any lowercase characters converted to their uppercase equivalents. For example, lowertoupper(`a`), lowertoupper(“My string”), and lowertoupper(field_name(2)) are all valid expressions.
matches Boolean Returns true if a string matches a specified pattern. The pattern must be a string literal; it can't be a field name containing a pattern. You can include a question mark (?) in the pattern to match exactly one character; an asterisk (*) matches zero or more characters. To match a literal question mark or asterisk (rather than using these as wildcards), use a backslash (\) as an escape character.
replace(SUBSTRING, NEWSUBSTRING, STRING) String Within the specified STRING, replace all instances of SUBSTRING with NEWSUBSTRING.
replicate(COUNT, STRING) String Returns a string that consists of the original string copied the specified number of times.
stripchar(CHAR,STRING) String Enables you to remove specified characters from a string or field. You can use this function, for example, to remove extra symbols, such as currency notations, from data to achieve a simple number or name. For example, using the syntax stripchar(`$`, 'Cost') returns a new field with the dollar sign removed from all values.
Be sure to use single back quotes to encapsulate the specified character.
skipchar(CHAR, N, STRING) Integer Searches the string STRING for any character other than CHAR, starting at the Nth character. This function returns an integer substring indicating the point at which one is found or 0 if every character from the Nth onward is a CHAR. If the function has an invalid offset (for example, an offset that is beyond the length of the string), this function returns $null$.
locchar is often used in conjunction with the skipchar functions to determine the value of N (the point at which to start searching the string). For example, skipchar(`s`, (locchar(`s`, 1, "MyString")), "MyString").
skipchar_back(CHAR, N, STRING) Integer Similar to skipchar, except that the search is performed backward, starting from the Nth character.
startstring(N, STRING) String Extracts the first N characters from the specified string. If the string length is less than or equal to the specified length, then it is unchanged.
strmember(CHAR, STRING) Integer Equivalent to locchar(CHAR, 1, STRING). It returns an integer substring indicating the point at which CHAR first occurs, or 0. If the function has an invalid offset (for example, an offset that is beyond the length of the string), this function returns $null$.
subscrs(N, STRING) CHAR Returns the Nth character CHAR of the input string STRING. This function can also be written in a shorthand form as STRING(N). For example, lowertoupper(“name”(1)) is a valid expression.
substring(N, LEN, STRING) String Returns a string SUBSTRING, which consists of the LEN characters of the string STRING, starting from the character at subscript N.
substring_between(N1, N2, STRING) String Returns the substring of STRING, which begins at subscript N1 and ends at subscript N2.
textsplit(STRING, N, CHAR) String textsplit(STRING,N,CHAR) returns the substring between the Nth-1 and Nth occurrence of CHAR. If N is 1, then it will return the substring from the beginning of STRING up to but not including CHAR. If N-1 is the last occurrence of CHAR, then it will return the substring from the Nth-1 occurrence of CHAR to the end of the string.
trim(STRING) String Removes leading and trailing white space characters from the specified string.
trimstart(STRING) String Removes leading white space characters from the specified string.
trimend(STRING) String Removes trailing white space characters from the specified string.
unicode_char(NUM) CHAR Input must be decimal, not hexadecimal values. Returns the character with Unicode value NUM.
unicode_value(CHAR) NUM Returns the Unicode value of CHAR.
uppertolower(CHAR) uppertolower (STRING) CHAR or String Input can be either a string or character and is used in this function to return a new item of the same type with any uppercase characters converted to their lowercase equivalents.
Remember to specify strings with double quotes and characters with single back quotes. Simple field names should be specified without quotes.