Package COM.INFORMATIMAGO.COMMON-LISP.HTML-PARSER.PARSE-HTML


This package implements a simple HTML parser.

Example:

        (parse-html-string "<html><head><title>Test</title></head>
        <body><h1>Little Test</h1>
        <p>How dy? <a href=\"/check.html\">Check this</a></p>
        <ul><li>one<li>two<li>three</ul></body></html>")
        --> ((:html nil (:head nil (:title nil "Test")) "
            " (:body nil (:h1 nil "Little Test") "
            " (:p nil "How dy? " (:a (:href "/check.html") "Check this")) "
            " (:ul nil (:li nil "one" (:li nil "two" (:li nil "three")))))))

Sexp html format:

    element    ::=  (tag (&rest attributes) &rest contents) .
    tag        ::= (or symbol string) . -- usually a keyword
    attributes ::= list of (name value) .
    contents   ::= list of element | string .
    name       ::= (or symbol string) . -- usually a keyword.
    value      ::= string .

License:

    AGPL3

    Copyright Pascal J. Bourguignon 2003 - 2015

    This program is free software: you can redistribute it and/or modify
    it under the terms of the GNU Affero General Public License as published by
    the Free Software Foundation, either version 3 of the License, or
    (at your option) any later version.

    This program is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU Affero General Public License for more details.

    You should have received a copy of the GNU Affero General Public License
    along with this program.
    If not, see <http://www.gnu.org/licenses/>

(parse-html-file pathname &key verbose external-format)
function
DO:                 Parse the HTML file PATHNAME.
VERBOSE:            When true, writes some information in the *TRACE-OUTPUT*.
EXTERNAL-FORMAT:    The external-format to use to open the HTML file.
RETURN:             A list of html elements.
SEE ALSO:           ELEMENT-TAG, ELEMENT-ATTRIBUTES, ATTRIBUTE-NAMED, ELEMENT-CHILDREN.
(parse-html-stream stream &key verbose)
function
DO:                 Parse the HTML stream STREAM.
VERBOSE:            When true, writes some information in the *TRACE-OUTPUT*.
RETURN:             A list of html elements.
SEE ALSO:           ELEMENT-TAG, ELEMENT-ATTRIBUTES, ATTRIBUTE-NAMED, ELEMENT-CHILDREN.
(parse-html-string string &key start end verbose)
function
DO:                 Parse the HTML in the STRING (between START and END)
VERBOSE:            When true, writes some information in the *TRACE-OUTPUT*.
RETURN:             A list of html elements.
SEE ALSO:           ELEMENT-TAG, ELEMENT-ATTRIBUTES, ATTRIBUTE-NAMED, ELEMENT-CHILDREN.
(unparse-html html &optional stream)
function
Writes back on STREAM the reconstituted HTML source.
(write-html-text html &optional stream)
function
Writes on STREAM a textual rendering of the HTML.
Some reStructuredText formating is used.
Simple tables are rendered, but colspan and rowspan are ignored.