TokenStream (open)
Split text into tokens
Syntax
LOADLIB "wh::util/langspecific.whlib";
OBJECTTYPE TokenStream
Description
This object reads text and returns tokens found within the text. Tokens are units of text, for example words, whitespace or punctuation. Words are normalized (accents removed, converted to lowercase) and stemmed (reduced to a base form) if a language is specified.
Constructor
- MACRO NEW(STRING text, STRING langcode)
Create a new TokenStream
Properties
- PROPERTY language
Set the language to use for normalizing and stemming
Functions
- MACRO AddText(STRING text)
Add (more) text to parse
- MACRO Close()
Deinitialize the TokenStream and free resources
- RECORD FUNCTION GetNextToken()
Return the next token in the stream