The parser language is a declarative language for specifying a
parser procedure. A parser procedure is a procedure that
accepts a single parser-buffer argument and parses some of the input
from the buffer. If the parse is successful, the procedure returns a
vector of objects that are the result of the parse, and the internal
pointer of the parser buffer is advanced past the input that was
parsed. If the parse fails, the procedure returns #f and the
internal pointer is unchanged. This interface is much like that of a
matcher procedure, except that on success the parser procedure returns
a vector of values rather than #t.
The *parser special form is the interface between the parser
language and Scheme.
The operand pexp is an expression in the parser language. The
*parserexpression expands into Scheme code that implements a parser procedure.
There are several primitive expressions in the parser language. The first two provide a bridge to the matcher language (see *Matcher):
The
matchexpression performs a match on the parser buffer. The match to be performed is specified by mexp, which is an expression in the matcher language. If the match is successful, the result of thematchexpression is a vector of one element: a string containing that text.
The
noiseexpression performs a match on the parser buffer. The match to be performed is specified by mexp, which is an expression in the matcher language. If the match is successful, the result of thenoiseexpression is a vector of zero elements. (In other words, the text is matched and then thrown away.)The mexp operand is often a known character or string, so in the case that mexp is a character or string literal, the
noiseexpression can be abbreviated as the literal. In other words, ‘(noise "foo")’ can be abbreviated just ‘"foo"’.
Sometimes it is useful to be able to insert arbitrary values into the parser result. The
valuesexpression supports this. The expression arguments are arbitrary Scheme expressions that are evaluated at run time and returned in a vector. Thevaluesexpression always succeeds and never modifies the internal pointer of the parser buffer.
The
discard-matchedexpression always succeeds, returning a vector of zero elements. In all other respects it is identical to thediscard-matchedexpression in the matcher language.
Next there are several combinator expressions. Parameters named pexp are arbitrary expressions in the parser language. The first few combinators are direct equivalents of those in the matcher language.
The
seqexpression parses each of the pexp operands in order. If all of the pexp operands successfully match, the result is the concatenation of their values (byvector-append).
The
altexpression attempts to parse each pexp operand in order from left to right. The first one that successfully parses produces the result for the entirealtexpression.Like the
altexpression in the matcher language, this expression participates in backtracking.
The
*expression parses zero or more occurrences of pexp. The results of the parsed occurrences are concatenated together (byvector-append) to produce the expression's result.Like the
*expression in the matcher language, this expression participates in backtracking.
The
*expression parses one or more occurrences of pexp. It is equivalent to(seq pexp (* pexp))
The
*expression parses zero or one occurrences of pexp. It is equivalent to(alt pexp (seq))
The next three expressions do not have equivalents in the matcher language. Each accepts a single pexp argument, which is parsed in the usual way. These expressions perform transformations on the returned values of a successful match.
The
transformexpression performs an arbitrary transformation of the values returned by parsing pexp. Expression is a Scheme expression that must evaluate to a procedure at run time. If pexp is successfully parsed, the procedure is called with the vector of values as its argument, and must return a vector or#f. If it returns a vector, the parse is successful, and those are the resulting values. If it returns#f, the parse fails and the internal pointer of the parser buffer is returned to what it was before pexp was parsed.For example:
(transform (lambda (v) (if (= 0 (vector-length v)) #f v)) ...)
The
encapsulateexpression transforms the values returned by parsing pexp into a single value. Expression is a Scheme expression that must evaluate to a procedure at run time. If pexp is successfully parsed, the procedure is called with the vector of values as its argument, and may return any Scheme object. The result of theencapsulateexpression is a vector of length one containing that object. (And consequentlyencapsulatedoesn't change the success or failure of pexp, only its value.)For example:
(encapsulate vector->list ...)
The
mapexpression performs a per-element transform on the values returned by parsing pexp. Expression is a Scheme expression that must evaluate to a procedure at run time. If pexp is successfully parsed, the procedure is mapped (byvector-map) over the values returned from the parse. The mapped values are returned as the result of themapexpression. (And consequentlymapdoesn't change the success or failure of pexp, nor the number of values returned.)For example:
(map string->symbol ...)
Finally, as in the matcher language, we have sexp and
with-pointer to support embedding Scheme code in the parser.
The
sexpexpression allows arbitrary Scheme code to be embedded inside a parser. The expression operand must evaluate to a parser procedure at run time; the procedure is called to parse the parser buffer. This is the parser-language equivalent of thesexpexpression in the matcher language.The case in which expression is a symbol is so common that it has an abbreviation: ‘(sexp symbol)’ may be abbreviated as just symbol.
The
with-pointerexpression fetches the parser buffer's internal pointer (usingget-parser-buffer-pointer), binds it to identifier, and then parses the pattern specified by pexp. Identifier must be a symbol. This is the parser-language equivalent of thewith-pointerexpression in the matcher language.