This package implements a simple recursive descent parser. Copyright Pascal J. Bourguignon 2006 - 2015 This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
|
*non-terminal-stack* |
variable |
For error reporting.
Initial value: NIL
ACCEPT
ADVANCE-LINE
|
(clean-rules rules) |
function |
RULES: a list of (--> non-terminal . rhs) rules. RETURN: an equivalent, cleaned list of rules.
COMPUTE-FOLLOW-SETS
COPY-GRAMMAR
|
(defgrammar name &key terminals start rules (scanner t) (skip-spaces t) (eof-symbol *eof-symbol*) (target-language lisp) (trace NIL)) |
macro |
DO: This macro generates a simple scanner and recursive decent parser
for the language described by this grammar.
For each <non-terminal> in the grammar, a function named
<name>/PARSE-<non-terminal> is generated, in addition to
functions SCAN-<name> and PARSE-<name>.
The grammar structure is also generated for run-time
in the global special variable <name>.
TERMINALS:
A list describing the terminal, of form either:
(terminal-name match-regexp)
(terminal-name match-regexp / exclude-regexp)
In the first form, if the match-regexp matches, left-anchored
to the current position, then the corresponding terminal is
recognized.
In the second form, if the match-regexp matches, left-anchored
to the current position, and the exclude-regexp DOES NOT
match, left-anchored to the first position following the
match, then the corresponding terminal is recognized.
Note that terminals don't necessarily have a name since they
may be written directly in the grammar rules as strings.
SCANNER:
Can be either:
- T A scanner is generated.
- NIL No scanner is generated.
- a class-name subclass of COM.INFORMATIMAGO.COMMON-LISP.PARSER.SCANNER
which is used to get tokens.
SKIP-SPACES:
When false, the spaces are not eaten between token. It's up
to the grammar or the scanner to deal with them. (Only when a
scanner is generated with :scanner t).
START:
The start symbol (non-terminal).
RULES:
A list of grammar rules.
EOF-SYMBOL:
The end-of-file symbol (terminal). Default is the value bound
to *EOF-SYMBOL* at macroexpansion-time.
TARGET-LANGUAGE:
Specifies the language into which the code is generated, as a
keyword. The actions must still be written as lisp
expressions, but to generate the language specified. There
must be a set of methods specialized on this target-language
keyword.
TRACE:
When true, the parser functions are traced..
SYNTAX:
(defgrammar <name>
:terminals (( <terminal> "regexp" [ / "regexp" ]) ...)
:start <non-terminal>
:rules ((--> <non-terminal> <items> ) ...))
<items> ::= | <item> <items>
<item> ::= <seq> | <rep> | <alt> | <opt>
| <non-terminal> | <literal-terminal> | <terminal>
<seq> ::= (SEQ <items> <action>) ; <items> may be empty.
<rep> ::= (REP <item> <items> <action>)
<opt> ::= (OPT <item> <items> <action>)
<alt> ::= (ALT <item> <items>)
<action> ::= | :ACTION <forms>
<forms> ::= | <form> <forms>
<form> ::= form -- any lisp form.
<non-terminal> ::= symbol -- any lisp symbol (keywords reserved).
<terminal> ::= symbol -- any lisp symbol (keywords reserved).
<literal-terminal> ::= string -- any lisp string.
SEMANTICS:
The terminals are either named terminals listed in the :TERMINALS
clause, or literal terminals written directly in the productions as
lisp strings. They are matched as-is.
An extended regular expression regex(7) may be given that
will be matched by the scanner to infer the given terminal.
The literal terminals are matched first, the longest first,
and with ([A-Za-z0-9]|$) appended to terminals ending in a
letter or digit (so that they don't match when part of a
longer identifier).
Then the regular expressions are matched in the order given.
A second regexp can be given for a terminal, which must not be
matched, after the first regexp, for the terminal to be
recognized. (:terminals ((alpha "[A-Za-z]+" / "[0-9]")))
will recognize "abcd, efgh" as the ALPHA "abcd", but
won't recognize "abcd42, efgh" as an ALPHA.
:START specifies the start non-terminal symbol.
The non-terminal symbols are infered implicitely from the grammar rules.
If there are more than one subforms, or an action,
the REP and OPT forms take an implicit SEQ:
(REP a b c :ACTION f) --> (REP (SEQ a b c :ACTION f))
(OPT a b c :ACTION f) --> (OPT (SEQ a b c :ACTION f))
(REP a :ACTION f) --> (REP (SEQ a :ACTION f))
(OPT a :ACTION f) --> (OPT (SEQ a :ACTION f))
(REP a) --> (REP a)
(OPT a) --> (OPT a)
Embedded ALT are flattened:
(ALT a b (ALT c d) e f) --> (ALT a b c d e f)
Actions are executed in a lexical environment where the
symbols $1, $2, etc are bound to the results of the
subforms. $0 is bound to the list of the results of the
subforms. In addition, for each non-terminal the non-terminal
symbol is bound to the result of the corresponding subform.
When the non-terminal is present several times in the
production, a dot and an index number is appended.
(--> example
(seq thing item "," item (opt "," item
:action (list item))
:action (list thing item.1 item.2 $5)))
would build a list with the results of parsing thing, the
two items, and the optional part.
The action for REP is to return a possibly empty list of the
results of its single subform repeated. (REP is 0 or more).
The action for OPT is to return either NIL, or the result of
its single subform unchanged.
The action for an ALT is to return the result of the selected
alternative unchanged.
The default action for an internal SEQ is to return the list of the
results of its subforms.
The default action for an implicit SEQ for a given <non-terminal> lhs
is to return the list of the results of its subforms prefixed by the
<non-terminal> symbol.
TODO: We could also flatten sequences without action, or even sequences with
actions with renumbering.
FIND-RHS
FIND-RHSES
|
(firsts-set grammar non-terminal-or-sentence) |
generic-function |
RETURN: the firsts-set of the NON-TERMINAL-OR-SENTENCE in the GRAMMAR.
|
(follow-set grammar non-terminal) |
generic-function |
RETURN: the follow-set of the NON-TERMINAL in the GRAMMAR.
|
(generate-grammar name &key terminals scanner skip-spaces start rules eof-symbol target-language trace) |
function |
SEE ALSO: The docstring of DEFGRAMMAR. RETURN: A form that defines the grammar object and its parser functions.
GRAMMAR
GRAMMAR-ALL-NON-TERMINALS
GRAMMAR-ALL-TERMINALS
GRAMMAR-NAME
|
(grammar-named name) |
function |
Returns the grammar named NAME, or NIL if none.
GRAMMAR-RULES
GRAMMAR-SKIP-SPACES
GRAMMAR-START
GRAMMAR-TERMINALS
MAKE-GRAMMAR
NON-TERMINAL-P
|
(normalize-grammar grammar) |
generic-function |
Return a new normalized grammar parsing the same language as GRAMMAR.
NULLABLEP
PARSER-END-OF-SOURCE-NOT-REACHED
|
parser-error |
condition |
A parser error.
Class precedence list: PARSER-ERROR ERROR SERIOUS-CONDITION CONDITION STANDARD-OBJECT T
Class init args: FILE LINE COLUMN GRAMMAR SCANNER NON-TERMINAL-STACK FORMAT-CONTROL FORMAT-ARGUMENTS
PARSER-ERROR-COLUMN
|
(parser-error-format-arguments x) |
generic-function |
The error message format control arguments.
|
(parser-error-format-control x) |
generic-function |
The error message format control string.
PARSER-ERROR-GRAMMAR
PARSER-ERROR-LINE
PARSER-ERROR-NON-TERMINAL-STACK
PARSER-ERROR-SCANNER
RDP-SCANNER
REGISTER-GRAMMAR
|
(scan-next-token scanner &optional parser-data) |
generic-function |
DO: Scans a new token and store it into (scanner-current-token scanner) PARSER-DATA: Some parsers give information to the scanner. RETURN: (scanner-current-token scanner).
SCANNER-BUFFER
|
(scanner-column x) |
generic-function |
The number of the current column.
|
(scanner-current-text x) |
generic-function |
Text of the current token
|
(scanner-current-token x) |
generic-function |
The last token read.
SCANNER-END-OF-SOURCE-P
|
(scanner-line x) |
generic-function |
The number of the current line.
|
(scanner-spaces x) |
generic-function |
A string containing the characters considered space by SKIP-SPACES.
|
(scanner-state x) |
generic-function |
The state of the scanner.
|
(scanner-tab-width x) |
generic-function |
TAB aligns to column number modulo TAB-WIDTH.
|
(skip-spaces scanner) |
generic-function |
DO: Skips over the spaces in the input stream. Updates line and column slots. RETURN: line; column
TERMINALP
|
token |
class |
A syntactic element.
Class precedence list: TOKEN SLOTED-OBJECT STANDARD-OBJECT T
Class init args: KIND TEXT COLUMN LINE
|
(token-column x) |
generic-function |
Returns the column of the first character of the token.
|
(token-kind x) |
generic-function |
Returns the kind of the token.
|
(token-line x) |
generic-function |
Returns the line where the token was found.
|
(token-text x) |
generic-function |
Returns the literal text the token.
UNEXPECTED-TOKEN-ERROR
UNEXPECTED-TOKEN-ERROR-EXPECTED-TOKENS
UNEXPECTED-TOKEN-ERROR-NON-TERMINAL-STACK
WORD-EQUAL