HtmlRewriter (open)
An HTML parser
Syntax
LOADLIB "wh::filetypes/html.whlib";
OBJECTTYPE HtmlRewriter
Constructor
Variables
- STRING ARRAY allowed_attrs
List of allowed attributes (in addition to our standard per-tag list), if strict filtering is enabled
- STRING ARRAY allowed_tags
List of allowed tags, if strict filtering is enabled
- BOOLEAN allow_comments
Whether to allow comments tags
- BOOLEAN allow_scripting
Whether to allow tags like , , , , , , etc that directly or indirectly allow scripting. Disabled by default
- BOOLEAN cleanup_msoffice
If enabled (default), try to clean up MS Office noise (strictly, this breaks (X)HTML conformance)
- BOOLEAN clean_newlines
Replace all newlines by spaces (for flash, which renders newlines)
- BOOLEAN debug
If enabled, prints out a lot of debug info
- STRING ARRAY disallowed_attrs
List of disallowed attributes
- RECORD ARRAY htmltags
HTML tag listing used to filter (see beginning of this file for its format)
- INTEGER max_content_length
Maximum text length (defaults to -1/no limit)
- FUNCTION PTR rewrite_hyperlink
If set, this function is called for every encountered hyperlink within attributes (see the 'links' attribute in the html array for the attributes that are handles as hyperlinks) Give back the rewritten hyperlink (signature: STRING FUNCTION(STRING hyperlink)
- FUNCTION PTR rewrite_img
If set, this function is called for every encountered image url within attributes (see the 'imgs' attribute in the html array for the attributes that are handles as hyperlinks) Give back the rewritten image url (signature: STRING FUNCTION(STRING imageurl)
- FUNCTION PTR rewrite_link
If set, this function is called for every encountered link within attributes (see the 'links' attribute in the html array for the attributes that are handles as links) Give back the rewritten link (signature: STRING FUNCTION(STRING link)
- BOOLEAN strict_filtering
Enable filtering based on allowed_tags and allowed_attrs, and filter all unknown attributes
- BOOLEAN trim_whitespace
Strip whitespace (whitespace characters and empty elements) from the beginning and end of the document
Properties
- PROPERTY content_truncated
If the text was truncated because of max_content_length was reached
- PROPERTY fix_loose_tags
- PROPERTY parsed_text
Parsed (and rewritten) html
- PROPERTY parsemode
Parse mode, one of "inline", "block" or "html". Defaults to "html"
Functions
- MACRO ParseXmlDocument(OBJECT doc)
- MACRO ParseXmlNodeContents(OBJECT node)
Parse the contents of a node - only works for mode inline & block