Tokenizes string and performs folding on each token.
A token is a non-empty sequence of alphanumeric characters in the
source string, separated by non-alphanumeric characters. An
"alphanumeric" character for this purpose is one that matches
GLib.unichar_isalnum or GLib.unichar_ismark.
Each token is then (Unicode) normalised and case-folded. If
ascii_alternates is non-NULL and some of the returned tokens
contain non-ASCII characters, ASCII alternatives will be generated.
The number of ASCII alternatives that are generated and the method
for doing so is unspecified, but translit_locale (if specified) may
improve the transliteration if the language of the source string is
known.
Parameters
string: string
a string to tokenize
Optionaltranslit_locale: string
the language code (like 'de' or 'en_GB') from which string originates
Tokenizes
stringand performs folding on each token.A token is a non-empty sequence of alphanumeric characters in the source string, separated by non-alphanumeric characters. An "alphanumeric" character for this purpose is one that matches GLib.unichar_isalnum or GLib.unichar_ismark.
Each token is then (Unicode) normalised and case-folded. If
ascii_alternatesis non-NULLand some of the returned tokens contain non-ASCII characters, ASCII alternatives will be generated.The number of ASCII alternatives that are generated and the method for doing so is unspecified, but
translit_locale(if specified) may improve the transliteration if the language of the source string is known.