Klacks parser
The Klacks parser provides an alternative parsing interface,
similar in concept to Java's Streaming API for
XML (StAX).
It implements a streaming, "pull-based" API. This is different
from SAX, which is a "push-based" model.
Klacks is implemented using the same code base as the SAX parser
and has the same parsing characteristics (validation, namespace
support, entity resolution) while offering a more flexible interface
than SAX.
See below for examples.
Parsing incrementally using sources
To parse using Klacks, create an XML source first.
Function CXML:MAKE-SOURCE (input &key validate
dtd root entity-resolver disallow-external-subset pathname)
Create and return a source for input.
Exact behaviour depends on input, which can
be one of the following types:
-
pathname -- a Common Lisp pathname.
Open the file specified by the pathname and create a source for
the resulting stream. See below for information on how to
close the stream.
-
stream -- a Common Lisp stream with element-type
(unsigned-byte 8). See below for information on how to
close the stream.
-
octets -- an (unsigned-byte 8) array.
The array is parsed directly, and interpreted according to the
encoding it specifies.
-
string/rod -- a rod (or string on
unicode-capable implementations).
Parses an XML document from the input string that has already
undergone external-format decoding.
Closing streams: Sources can refer to Lisp streams that
need to be closed after parsing. This includes a stream passed
explicitly as input, a stream created implicitly for the
pathname case, as well as any streams created
automatically for external parsed entities referred to by the
document.
All these stream get closed automatically if end of file is
reached normally. Use klacks:close-source or
klacks:with-open-source to ensure that the streams get
closed otherwise.
Buffering: By default, the Klacks parser performs buffering
of octets being read from the stream as an optimization. This can
result in unwanted blocking if the stream is a socket and the
parser tries to read more data than required to parse the current
event. Use :buffering nil to disable this optimization.
-
buffering -- Boolean, defaults to t. If
enabled, read data several kilobytes at time. If disabled,
read only single bytes at a time.
The following keyword arguments have the same meaning as
with the SAX parser, please refer to the documentation of parse-file for more information:
-
validate
-
dtd
-
root
-
entity-resolver
-
disallow-internal-subset
In addition, the following argument is for types of input
other than pathname:
-
pathname -- If specified, defines the base URI of the
document based on this pathname instance.
Events are read from the stream using the following functions:
Function KLACKS:PEEK (source)
=> :start-document
or => :start-document, version, encoding, standalonep
or => :dtd, name, public-id, system-id
or => :start-element, uri, lname, qname
or => :end-element, uri, lname, qname
or => :characters, data
or => :processing-instruction, target, data
or => :comment, data
or => :end-document, data
or => nil
peek returns the current event's key and main values.
Function KLACKS:PEEK-NEXT (source) => key, value*
Advance the source forward to the next event and returns it
like peek would.
Function KLACKS:PEEK-VALUE (source) => value*
Like peek, but return only the values, not the key.
Function KLACKS:CONSUME (source) => key, value*
Return the same values peek would, and in addition
advance the source forward to the next event.
Function KLACKS:CURRENT-URI (source) => uri
Function KLACKS:CURRENT-LNAME (source) => string
Function KLACKS:CURRENT-QNAME (source) => string
If the current event is :start-element or :end-element, return the
corresponding value. Else, signal an error.
Function KLACKS:CURRENT-CHARACTERS (source) => string
If the current event is :characters, return the character data
value. Else, signal an error.
Function KLACKS:CURRENT-CDATA-SECTION-P (source) => boolean
If the current event is :characters, determine whether the data was
specified using a CDATA section in the source document. Else,
signal an error.
Function KLACKS:MAP-CURRENT-NAMESPACE-DECLARATIONS (fn source) => nil
For use only on :start-element and :end-element events, this
function report every namespace declaration on the current element.
On :start-element, these correspond to the xmlns attributes of the
start tag. On :end-element, the declarations of the corresponding
start tag are reported. No inherited namespaces are
included. fn is called only for each declaration with two
arguments, the prefix and uri.
Function KLACKS:MAP-ATTRIBUTES (fn source)
Call fn for each attribute of the current start tag in
turn, and pass the following values as arguments to the function:
- namespace uri
- local name
- qualified name
- attribute value
- a boolean indicating whether the attribute was specified
explicitly in the source document, rather than defaulted from
a DTD
Only valid for :start-element.
Return a list of SAX attribute structures for the current start tag.
Only valid for :start-element.
Function KLACKS:CLOSE-SOURCE (source)
Close all streams referred to by source.
Macro KLACKS:WITH-OPEN-SOURCE ((var source) &body body)
Evaluate source to create a source object, bind it to
symbol var and evaluate body as an implicit progn.
Call klacks:close-source to close the source after
exiting body, whether normally or abnormally.
Convenience functions
Function KLACKS:FIND-EVENT (source key)
Read events from source and discard them until an event
of type key is found. Return values like peek, or
NIL if no such event was found.
Function KLACKS:FIND-ELEMENT (source &optional
lname uri)
Read events from source and discard them until an event
of type :start-element is found with matching local name and
namespace uri is found. If lname is nil, any
tag name matches. If uri is nil, any
namespace matches. Return values like peek or NIL if no
such event was found.
Condition KLACKS:KLACKS-ERROR (xml-parse-error)
The condition class signalled by expect.
Function KLACKS:EXPECT (source key &optional
value1 value2 value3)
Assert that the current event is equal to (key value1 value2
value3). (Ignore value arguments that are NIL.) If so,
return it as multiple values. Otherwise signal a
klacks-error.
Function KLACKS:SKIP (source key &optional
value1 value2 value3)
expect the specific event, then consume it.
Macro KLACKS:EXPECTING-ELEMENT ((fn source
&optional lname uri) &body body
Assert that the current event matches (:start-element uri lname).
(Ignore value arguments that are NIL) Otherwise signal a
klacks-error.
Evaluate body as an implicit progn. Finally assert that
the remaining event matches (:end-element uri lname).
Bridging Klacks and SAX
Function KLACKS:SERIALIZE-EVENT (source handler)
Send the current klacks event from source as a SAX
event to the SAX handler and consume it.
Function KLACKS:SERIALIZE-ELEMENT (source handler
&key document-events)
Read all klacks events from the following :start-element to
its :end-element and send them as SAX events
to handler. When this function is called, the current
event must be :start-element, else an error is
signalled. With document-events (the default),
sax:start-document and sax:end-document events
are sent around the element.
Function KLACKS:SERIALIZE-SOURCE (source handler)
Read all klacks events from source and send them as SAX
events to the SAX handler.
Class KLACKS:TAPPING-SOURCE (source)
A klacks source that relays events from an upstream klacks source
unchanged, while also emitting them as SAX events to a
user-specified handler at the same time.
Functon KLACKS:MAKE-TAPPING-SOURCE
(upstream-source &optional sax-handler)
Create a tapping source relaying events
for upstream-source, and sending SAX events
to sax-handler.
Location information
Function KLACKS:CURRENT-LINE-NUMBER (source)
Return an approximation of the current line number, or NIL.
Function KLACKS:CURRENT-COLUMN-NUMBER (source)
Return an approximation of the current column number, or NIL.
Function KLACKS:CURRENT-SYSTEM-ID (source)
Return the URI of the document being parsed. This is either the
main document, or the entity's system ID while contents of a parsed
general external entity are being processed.
Function KLACKS:CURRENT-XML-BASE (source)
Return the [Base URI] of the current element. This URI can differ from
the value returned by current-system-id if xml:base
attributes are present.
Examples
The following example illustrates creation of a klacks source,
use of the peek-next function to read individual events,
and shows some of the most common event types.
* (defparameter *source* (cxml:make-source "<example>text</example>"))
*SOURCE*
* (klacks:peek-next *source*)
:START-DOCUMENT
* (klacks:peek-next *source*)
:START-ELEMENT
NIL ;namespace URI
"example" ;local name
"example" ;qualified name
* (klacks:peek-next *source*)
:CHARACTERS
"text"
* (klacks:peek-next *source*)
:END-ELEMENT
NIL
"example"
"example"
* (klacks:peek-next *source*)
:END-DOCUMENT
* (klacks:peek-next *source*)
NIL
In this example, find-element is used to skip over the
uninteresting events until the opening child1 tag is
found. Then serialize-element is used to generate SAX
events for the following element, including its children, and an
xmls-compatible list structure is built from those
events. find-element skips over whitespace,
and find-event is used to parse up
to :end-document, ensuring that the source has been
closed.
* (defparameter *source*
(cxml:make-source "<example>
<child1><p>foo</p></child1>
<child2 bar='baz'/>
</example>"))
*SOURCE*
* (klacks:find-element *source* "child1")
:START-ELEMENT
NIL
"child1"
"child1"
* (klacks:serialize-element *source* (cxml-xmls:make-xmls-builder))
("child1" NIL ("p" NIL "foo"))
* (klacks:find-element *source*)
:START-ELEMENT
NIL
"child2"
"child2"
* (klacks:serialize-element *source* (cxml-xmls:make-xmls-builder))
("child2" (("bar" "baz")))
* (klacks:find-event *source* :end-document)
:END-DOCUMENT
NIL
NIL
NIL