@@ -1,15 +1,13 @@ | |||
# Classes - Reference | |||
# Classes Reference | |||
This page details the important classes in Lark. | |||
---- | |||
## Lark | |||
## lark.Lark | |||
The Lark class is the main interface for the library. It's mostly a thin wrapper for the many different parsers, and for the tree constructor. | |||
### Methods | |||
#### \_\_init\_\_(self, grammar, **options) | |||
The Lark class accepts a grammar string or file object, and keyword options: | |||
@@ -50,14 +48,10 @@ If a transformer is supplied to `__init__`, returns whatever is the result of th | |||
The main tree class | |||
### Properties | |||
* `data` - The name of the rule or alias | |||
* `children` - List of matched sub-rules and terminals | |||
* `meta` - Line & Column numbers, if using `propagate_positions` | |||
### Methods | |||
#### \_\_init\_\_(self, data, children) | |||
Creates a new tree, and stores "data" and "children" in attributes of the same name. | |||
@@ -92,102 +86,6 @@ Trees can be hashed and compared. | |||
---- | |||
## Transformers & Visitors | |||
Transformers & Visitors provide a convenient interface to process the parse-trees that Lark returns. | |||
They are used by inheriting from the correct class (visitor or transformer), and implementing methods corresponding to the rule you wish to process. Each methods accepts the children as an argument. That can be modified using the `v-args` decorator, which allows to inline the arguments (akin to `*args`), or add the tree `meta` property as an argument. | |||
See: https://github.com/lark-parser/lark/blob/master/lark/visitors.py | |||
### Visitors | |||
Visitors visit each node of the tree, and run the appropriate method on it according to the node's data. | |||
They work bottom-up, starting with the leaves and ending at the root of the tree. | |||
**Example** | |||
```python | |||
class IncreaseAllNumbers(Visitor): | |||
def number(self, tree): | |||
assert tree.data == "number" | |||
tree.children[0] += 1 | |||
IncreaseAllNumbers().visit(parse_tree) | |||
``` | |||
There are two classes that implement the visitor interface: | |||
* Visitor - Visit every node (without recursion) | |||
* Visitor_Recursive - Visit every node using recursion. Slightly faster. | |||
### Transformers | |||
Transformers visit each node of the tree, and run the appropriate method on it according to the node's data. | |||
They work bottom-up (or: depth-first), starting with the leaves and ending at the root of the tree. | |||
Transformers can be used to implement map & reduce patterns. | |||
Because nodes are reduced from leaf to root, at any point the callbacks may assume the children have already been transformed (if applicable). | |||
Transformers can be chained into a new transformer by using multiplication. | |||
**Example:** | |||
```python | |||
from lark import Tree, Transformer | |||
class EvalExpressions(Transformer): | |||
def expr(self, args): | |||
return eval(args[0]) | |||
t = Tree('a', [Tree('expr', ['1+2'])]) | |||
print(EvalExpressions().transform( t )) | |||
# Prints: Tree(a, [3]) | |||
``` | |||
Here are the classes that implement the transformer interface: | |||
- Transformer - Recursively transforms the tree. This is the one you probably want. | |||
- Transformer_InPlace - Non-recursive. Changes the tree in-place instead of returning new instances | |||
- Transformer_InPlaceRecursive - Recursive. Changes the tree in-place instead of returning new instances | |||
### v_args | |||
`v_args` is a decorator. | |||
By default, callback methods of transformers/visitors accept one argument: a list of the node's children. `v_args` can modify this behavior. | |||
When used on a transformer/visitor class definition, it applies to all the callback methods inside it. | |||
`v_args` accepts one of three flags: | |||
- `inline` - Children are provided as `*args` instead of a list argument (not recommended for very long lists). | |||
- `meta` - Provides two arguments: `children` and `meta` (instead of just the first) | |||
- `tree` - Provides the entire tree as the argument, instead of the children. | |||
Examples: | |||
```python | |||
@v_args(inline=True) | |||
class SolveArith(Transformer): | |||
def add(self, left, right): | |||
return left + right | |||
class ReverseNotation(Transformer_InPlace): | |||
@v_args(tree=True): | |||
def tree_node(self, tree): | |||
tree.children = tree.children[::-1] | |||
``` | |||
### Discard | |||
When raising the `Discard` exception in a transformer callback, that node is discarded and won't appear in the parent. | |||
## Token | |||
When using a lexer, the resulting tokens in the trees will be of the Token class, which inherits from Python's string. So, normal string comparisons and operations will work as expected. Tokens also have other useful attributes: | |||
@@ -199,17 +97,25 @@ When using a lexer, the resulting tokens in the trees will be of the Token class | |||
* `end_line` - The line where the token ends | |||
* `end_column` - The next column after the end of the token. For example, if the token is a single character with a `column` value of 4, `end_column` will be 5. | |||
## Transformer | |||
## Visitor | |||
## Interpreter | |||
See the [visitors page](visitors.md) | |||
## UnexpectedInput | |||
## UnexpectedToken | |||
## UnexpectedException | |||
- `UnexpectedInput` | |||
- `UnexpectedToken` - The parser recieved an unexpected token | |||
- `UnexpectedCharacters` - The lexer encountered an unexpected string | |||
After catching one of these exceptions, you may call the following helper methods to create a nicer error message: | |||
### Methods | |||
#### get_context(text, span) | |||
Returns a pretty string pinpointing the error in the text, with `span` amount of context characters around it. | |||
@@ -1,5 +1,13 @@ | |||
# Grammar Reference | |||
Table of contents: | |||
1. [Definitions](#defs) | |||
1. [Terminals](#terms) | |||
1. [Rules](#rules) | |||
1. [Directives](#dirs) | |||
<a name="defs"></a> | |||
## Definitions | |||
**A grammar** is a list of rules and terminals, that together define a language. | |||
@@ -25,6 +33,7 @@ Lark begins the parse with the rule 'start', unless specified otherwise in the o | |||
Names of rules are always in lowercase, while names of terminals are always in uppercase. This distinction has practical effects, for the shape of the generated parse-tree, and the automatic construction of the lexer (aka tokenizer, or scanner). | |||
<a name="terms"></a> | |||
## Terminals | |||
Terminals are used to match text into symbols. They can be defined as a combination of literals and other terminals. | |||
@@ -70,6 +79,53 @@ WHITESPACE: (" " | /\t/ )+ | |||
SQL_SELECT: "select"i | |||
``` | |||
### Regular expressions & Ambiguity | |||
Each terminal is eventually compiled to a regular expression. All the operators and references inside it are mapped to their respective expressions. | |||
For example, in the following grammar, `A1` and `A2`, are equivalent: | |||
```perl | |||
A1: "a" | "b" | |||
A2: /a|b/ | |||
``` | |||
This means that inside terminals, Lark cannot detect or resolve ambiguity, even when using Earley. | |||
For example, for this grammar: | |||
```perl | |||
start : (A | B)+ | |||
A : "a" | "ab" | |||
B : "b" | |||
``` | |||
We get this behavior: | |||
```bash | |||
>>> p.parse("ab") | |||
Tree(start, [Token(A, 'a'), Token(B, 'b')]) | |||
``` | |||
This is happening because Python's regex engine always returns the first matching option. | |||
If you find yourself in this situation, the recommended solution is to use rules instead. | |||
Example: | |||
```python | |||
>>> p = Lark("""start: (a | b)+ | |||
... !a: "a" | "ab" | |||
... !b: "b" | |||
... """, ambiguity="explicit") | |||
>>> print(p.parse("ab").pretty()) | |||
_ambig | |||
start | |||
a ab | |||
start | |||
a a | |||
b b | |||
``` | |||
<a name="rules"></a> | |||
## Rules | |||
**Syntax:** | |||
@@ -114,6 +170,7 @@ Rules can be assigned priority only when using Earley (future versions may suppo | |||
Priority can be either positive or negative. In not specified for a terminal, it's assumed to be 1 (i.e. the default). | |||
<a name="dirs"></a> | |||
## Directives | |||
### %ignore | |||
@@ -122,7 +179,7 @@ All occurrences of the terminal will be ignored, and won't be part of the parse. | |||
Using the `%ignore` directive results in a cleaner grammar. | |||
It's especially important for the LALR(1) algorithm, because adding whitespace (or comments, or other extranous elements) explicitly in the grammar, harms its predictive abilities, which are based on a lookahead of 1. | |||
It's especially important for the LALR(1) algorithm, because adding whitespace (or comments, or other extraneous elements) explicitly in the grammar, harms its predictive abilities, which are based on a lookahead of 1. | |||
**Syntax:** | |||
```html | |||
@@ -7,7 +7,7 @@ There are many ways you can help the project: | |||
* Write new grammars for Lark's library | |||
* Write a blog post introducing Lark to your audience | |||
* Port Lark to another language | |||
* Help me with code developemnt | |||
* Help me with code development | |||
If you're interested in taking one of these on, let me know and I will provide more details and assist you in the process. | |||
@@ -60,4 +60,4 @@ Another way to run the tests is using setup.py: | |||
```bash | |||
python setup.py test | |||
``` | |||
``` |
@@ -35,8 +35,8 @@ $ pip install lark-parser | |||
* [Examples](https://github.com/lark-parser/lark/tree/master/examples) | |||
* Tutorials | |||
* [How to write a DSL](http://blog.erezsh.com/how-to-write-a-dsl-in-python-with-lark/) - Implements a toy LOGO-like language with an interpreter | |||
* [How to write a JSON parser](json_tutorial.md) | |||
* External | |||
* [How to write a JSON parser](json_tutorial.md) - Teaches you how to use Lark | |||
* Unofficial | |||
* [Program Synthesis is Possible](https://www.cs.cornell.edu/~asampson/blog/minisynth.html) - Creates a DSL for Z3 | |||
* Guides | |||
* [How to use Lark](how_to_use.md) | |||
@@ -44,6 +44,7 @@ $ pip install lark-parser | |||
* Reference | |||
* [Grammar](grammar.md) | |||
* [Tree Construction](tree_construction.md) | |||
* [Visitors & Transformers](visitors.md) | |||
* [Classes](classes.md) | |||
* [Cheatsheet (PDF)](lark_cheatsheet.pdf) | |||
* Discussion | |||
@@ -230,7 +230,8 @@ from lark import Transformer | |||
class MyTransformer(Transformer): | |||
def list(self, items): | |||
return list(items) | |||
def pair(self, (k,v)): | |||
def pair(self, key_value): | |||
k, v = key_value | |||
return k, v | |||
def dict(self, items): | |||
return dict(items) | |||
@@ -251,9 +252,11 @@ Also, our definitions of list and dict are a bit verbose. We can do better: | |||
from lark import Transformer | |||
class TreeToJson(Transformer): | |||
def string(self, (s,)): | |||
def string(self, s): | |||
(s,) = s | |||
return s[1:-1] | |||
def number(self, (n,)): | |||
def number(self, n): | |||
(n,) = n | |||
return float(n) | |||
list = list | |||
@@ -315,9 +318,11 @@ json_grammar = r""" | |||
""" | |||
class TreeToJson(Transformer): | |||
def string(self, (s,)): | |||
def string(self, s): | |||
(s,) = s | |||
return s[1:-1] | |||
def number(self, (n,)): | |||
def number(self, n): | |||
(n,) = n | |||
return float(n) | |||
list = list | |||
@@ -5,9 +5,9 @@ Lark implements the following parsing algorithms: Earley, LALR(1), and CYK | |||
An [Earley Parser](https://www.wikiwand.com/en/Earley_parser) is a chart parser capable of parsing any context-free grammar at O(n^3), and O(n^2) when the grammar is unambiguous. It can parse most LR grammars at O(n). Most programming languages are LR, and can be parsed at a linear time. | |||
Lark's Earley implementation runs on top of a skipping chart parser, which allows it to use regular expressions, instead of matching characters one-by-one. This is a huge improvement to Earley that is unique to Lark. This feature is used by default, but can also be requested explicitely using `lexer='dynamic'`. | |||
Lark's Earley implementation runs on top of a skipping chart parser, which allows it to use regular expressions, instead of matching characters one-by-one. This is a huge improvement to Earley that is unique to Lark. This feature is used by default, but can also be requested explicitly using `lexer='dynamic'`. | |||
It's possible to bypass the dynamic lexing, and use the regular Earley parser with a traditional lexer, that tokenizes as an independant first step. Doing so will provide a speed benefit, but will tokenize without using Earley's ambiguity-resolution ability. So choose this only if you know why! Activate with `lexer='standard'` | |||
It's possible to bypass the dynamic lexing, and use the regular Earley parser with a traditional lexer, that tokenizes as an independent first step. Doing so will provide a speed benefit, but will tokenize without using Earley's ambiguity-resolution ability. So choose this only if you know why! Activate with `lexer='standard'` | |||
**SPPF & Ambiguity resolution** | |||
@@ -21,7 +21,7 @@ Lark provides the following options to combat ambiguity: | |||
1) Lark will choose the best derivation for you (default). Users can choose between different disambiguation strategies, and can prioritize (or demote) individual rules over others, using the rule-priority syntax. | |||
2) Users may choose to recieve the set of all possible parse-trees (using ambiguity='explicit'), and choose the best derivation themselves. While simple and flexible, it comes at the cost of space and performance, and so it isn't recommended for highly ambiguous grammars, or very long inputs. | |||
2) Users may choose to receive the set of all possible parse-trees (using ambiguity='explicit'), and choose the best derivation themselves. While simple and flexible, it comes at the cost of space and performance, and so it isn't recommended for highly ambiguous grammars, or very long inputs. | |||
3) As an advanced feature, users may use specialized visitors to iterate the SPPF themselves. Future versions of Lark intend to improve and simplify this interface. | |||
@@ -0,0 +1,117 @@ | |||
## Transformers & Visitors | |||
Transformers & Visitors provide a convenient interface to process the parse-trees that Lark returns. | |||
They are used by inheriting from the correct class (visitor or transformer), and implementing methods corresponding to the rule you wish to process. Each method accepts the children as an argument. That can be modified using the `v_args` decorator, which allows to inline the arguments (akin to `*args`), or add the tree `meta` property as an argument. | |||
See: <a href="https://github.com/lark-parser/lark/blob/master/lark/visitors.py">visitors.py</a> | |||
### Visitors | |||
Visitors visit each node of the tree, and run the appropriate method on it according to the node's data. | |||
They work bottom-up, starting with the leaves and ending at the root of the tree. | |||
**Example** | |||
```python | |||
class IncreaseAllNumbers(Visitor): | |||
def number(self, tree): | |||
assert tree.data == "number" | |||
tree.children[0] += 1 | |||
IncreaseAllNumbers().visit(parse_tree) | |||
``` | |||
There are two classes that implement the visitor interface: | |||
* Visitor - Visit every node (without recursion) | |||
* Visitor_Recursive - Visit every node using recursion. Slightly faster. | |||
### Transformers | |||
Transformers visit each node of the tree, and run the appropriate method on it according to the node's data. | |||
They work bottom-up (or: depth-first), starting with the leaves and ending at the root of the tree. | |||
Transformers can be used to implement map & reduce patterns. | |||
Because nodes are reduced from leaf to root, at any point the callbacks may assume the children have already been transformed (if applicable). | |||
Transformers can be chained into a new transformer by using multiplication. | |||
`Transformer` can do anything `Visitor` can do, but because it reconstructs the tree, it is slightly less efficient. | |||
**Example:** | |||
```python | |||
from lark import Tree, Transformer | |||
class EvalExpressions(Transformer): | |||
def expr(self, args): | |||
return eval(args[0]) | |||
t = Tree('a', [Tree('expr', ['1+2'])]) | |||
print(EvalExpressions().transform( t )) | |||
# Prints: Tree(a, [3]) | |||
``` | |||
All these classes implement the transformer interface: | |||
- Transformer - Recursively transforms the tree. This is the one you probably want. | |||
- Transformer_InPlace - Non-recursive. Changes the tree in-place instead of returning new instances | |||
- Transformer_InPlaceRecursive - Recursive. Changes the tree in-place instead of returning new instances | |||
### visit_tokens | |||
By default, transformers only visit rules. `visit_tokens=True` will tell Transformer to visit tokens as well. This is a slightly slower alternative to `lexer_callbacks`, but it's easier to maintain and works for all algorithms (even when there isn't a lexer). | |||
Example: | |||
```python | |||
class T(Transformer): | |||
INT = int | |||
NUMBER = float | |||
def NAME(self, name): | |||
return lookup_dict.get(name, name) | |||
T(visit_tokens=True).transform(tree) | |||
``` | |||
### v_args | |||
`v_args` is a decorator. | |||
By default, callback methods of transformers/visitors accept one argument: a list of the node's children. `v_args` can modify this behavior. | |||
When used on a transformer/visitor class definition, it applies to all the callback methods inside it. | |||
`v_args` accepts one of three flags: | |||
- `inline` - Children are provided as `*args` instead of a list argument (not recommended for very long lists). | |||
- `meta` - Provides two arguments: `children` and `meta` (instead of just the first) | |||
- `tree` - Provides the entire tree as the argument, instead of the children. | |||
Examples: | |||
```python | |||
@v_args(inline=True) | |||
class SolveArith(Transformer): | |||
def add(self, left, right): | |||
return left + right | |||
class ReverseNotation(Transformer_InPlace): | |||
@v_args(tree=True): | |||
def tree_node(self, tree): | |||
tree.children = tree.children[::-1] | |||
``` | |||
### Discard | |||
When raising the `Discard` exception in a transformer callback, that node is discarded and won't appear in the parent. | |||
@@ -5,4 +5,4 @@ from .exceptions import ParseError, LexError, GrammarError, UnexpectedToken, Une | |||
from .lexer import Token | |||
from .lark import Lark | |||
__version__ = "0.7.4" | |||
__version__ = "0.8.0rc1" |
@@ -13,6 +13,14 @@ class ParseError(LarkError): | |||
class LexError(LarkError): | |||
pass | |||
class UnexpectedEOF(ParseError): | |||
def __init__(self, expected): | |||
self.expected = expected | |||
message = ("Unexpected end-of-input. Expected one of: \n\t* %s\n" % '\n\t* '.join(x.name for x in self.expected)) | |||
super(UnexpectedEOF, self).__init__(message) | |||
class UnexpectedInput(LarkError): | |||
pos_in_stream = None | |||
@@ -69,6 +69,7 @@ class LarkOptions(Serialize): | |||
'propagate_positions': False, | |||
'lexer_callbacks': {}, | |||
'maybe_placeholders': False, | |||
'edit_terminals': None, | |||
} | |||
def __init__(self, options_dict): | |||
@@ -85,7 +86,7 @@ class LarkOptions(Serialize): | |||
options[name] = value | |||
if isinstance(options['start'], str): | |||
if isinstance(options['start'], STRING_TYPE): | |||
options['start'] = [options['start']] | |||
self.__dict__['options'] = options | |||
@@ -205,6 +206,10 @@ class Lark(Serialize): | |||
# Compile the EBNF grammar into BNF | |||
self.terminals, self.rules, self.ignore_tokens = self.grammar.compile(self.options.start) | |||
if self.options.edit_terminals: | |||
for t in self.terminals: | |||
self.options.edit_terminals(t) | |||
self._terminals_dict = {t.name:t for t in self.terminals} | |||
# If the user asked to invert the priorities, negate them all here. | |||
@@ -3,7 +3,7 @@ | |||
import re | |||
from .utils import Str, classify, get_regexp_width, Py36, Serialize | |||
from .exceptions import UnexpectedCharacters, LexError | |||
from .exceptions import UnexpectedCharacters, LexError, UnexpectedToken | |||
###{standalone | |||
@@ -43,7 +43,7 @@ class PatternStr(Pattern): | |||
__serialize_fields__ = 'value', 'flags' | |||
type = "str" | |||
def to_regexp(self): | |||
return self._get_flags(re.escape(self.value)) | |||
@@ -166,36 +166,33 @@ class _Lex: | |||
while line_ctr.char_pos < len(stream): | |||
lexer = self.lexer | |||
for mre, type_from_index in lexer.mres: | |||
m = mre.match(stream, line_ctr.char_pos) | |||
if not m: | |||
continue | |||
t = None | |||
value = m.group(0) | |||
type_ = type_from_index[m.lastindex] | |||
if type_ not in ignore_types: | |||
t = Token(type_, value, line_ctr.char_pos, line_ctr.line, line_ctr.column) | |||
if t.type in lexer.callback: | |||
t = lexer.callback[t.type](t) | |||
if not isinstance(t, Token): | |||
raise ValueError("Callbacks must return a token (returned %r)" % t) | |||
last_token = t | |||
yield t | |||
else: | |||
if type_ in lexer.callback: | |||
t = Token(type_, value, line_ctr.char_pos, line_ctr.line, line_ctr.column) | |||
lexer.callback[type_](t) | |||
line_ctr.feed(value, type_ in newline_types) | |||
if t: | |||
t.end_line = line_ctr.line | |||
t.end_column = line_ctr.column | |||
res = lexer.match(stream, line_ctr.char_pos) | |||
if not res: | |||
allowed = {v for m, tfi in lexer.mres for v in tfi.values()} - ignore_types | |||
if not allowed: | |||
allowed = {"<END-OF-FILE>"} | |||
raise UnexpectedCharacters(stream, line_ctr.char_pos, line_ctr.line, line_ctr.column, allowed=allowed, state=self.state, token_history=last_token and [last_token]) | |||
break | |||
value, type_ = res | |||
t = None | |||
if type_ not in ignore_types: | |||
t = Token(type_, value, line_ctr.char_pos, line_ctr.line, line_ctr.column) | |||
if t.type in lexer.callback: | |||
t = lexer.callback[t.type](t) | |||
if not isinstance(t, Token): | |||
raise ValueError("Callbacks must return a token (returned %r)" % t) | |||
last_token = t | |||
yield t | |||
else: | |||
allowed = {v for m, tfi in lexer.mres for v in tfi.values()} | |||
raise UnexpectedCharacters(stream, line_ctr.char_pos, line_ctr.line, line_ctr.column, allowed=allowed, state=self.state, token_history=last_token and [last_token]) | |||
if type_ in lexer.callback: | |||
t = Token(type_, value, line_ctr.char_pos, line_ctr.line, line_ctr.column) | |||
lexer.callback[type_](t) | |||
line_ctr.feed(value, type_ in newline_types) | |||
if t: | |||
t.end_line = line_ctr.line | |||
t.end_column = line_ctr.column | |||
class UnlessCallback: | |||
@@ -330,6 +327,11 @@ class TraditionalLexer(Lexer): | |||
self.mres = build_mres(terminals) | |||
def match(self, stream, pos): | |||
for mre, type_from_index in self.mres: | |||
m = mre.match(stream, pos) | |||
if m: | |||
return m.group(0), type_from_index[m.lastindex] | |||
def lex(self, stream): | |||
return _Lex(self).lex(stream, self.newline_types, self.ignore_types) | |||
@@ -367,9 +369,21 @@ class ContextualLexer(Lexer): | |||
def lex(self, stream): | |||
l = _Lex(self.lexers[self.parser_state], self.parser_state) | |||
for x in l.lex(stream, self.root_lexer.newline_types, self.root_lexer.ignore_types): | |||
yield x | |||
l.lexer = self.lexers[self.parser_state] | |||
l.state = self.parser_state | |||
try: | |||
for x in l.lex(stream, self.root_lexer.newline_types, self.root_lexer.ignore_types): | |||
yield x | |||
l.lexer = self.lexers[self.parser_state] | |||
l.state = self.parser_state | |||
except UnexpectedCharacters as e: | |||
# In the contextual lexer, UnexpectedCharacters can mean that the terminal is defined, | |||
# but not in the current context. | |||
# This tests the input against the global context, to provide a nicer error. | |||
root_match = self.root_lexer.match(stream, e.pos_in_stream) | |||
if not root_match: | |||
raise | |||
value, type_ = root_match | |||
t = Token(type_, value, e.pos_in_stream, e.line, e.column) | |||
raise UnexpectedToken(t, e.allowed, state=e.state) | |||
###} |
@@ -479,7 +479,7 @@ class Grammar: | |||
# =================== | |||
# Convert terminal-trees to strings/regexps | |||
transformer = PrepareLiterals() * TerminalTreeToPattern() | |||
for name, (term_tree, priority) in term_defs: | |||
if term_tree is None: # Terminal added through %declare | |||
continue | |||
@@ -487,7 +487,8 @@ class Grammar: | |||
if len(expansions) == 1 and not expansions[0].children: | |||
raise GrammarError("Terminals cannot be empty (%s)" % name) | |||
terminals = [TerminalDef(name, transformer.transform(term_tree), priority) | |||
transformer = PrepareLiterals() * TerminalTreeToPattern() | |||
terminals = [TerminalDef(name, transformer.transform( term_tree ), priority) | |||
for name, (term_tree, priority) in term_defs if term_tree] | |||
# ================= | |||
@@ -638,11 +639,10 @@ def import_from_grammar_into_namespace(grammar, namespace, aliases): | |||
def resolve_term_references(term_defs): | |||
# TODO Cycles detection | |||
# TODO Solve with transitive closure (maybe) | |||
token_dict = {k:t for k, (t,_p) in term_defs} | |||
assert len(token_dict) == len(term_defs), "Same name defined twice?" | |||
term_dict = {k:t for k, (t,_p) in term_defs} | |||
assert len(term_dict) == len(term_defs), "Same name defined twice?" | |||
while True: | |||
changed = False | |||
@@ -655,11 +655,21 @@ def resolve_term_references(term_defs): | |||
if item.type == 'RULE': | |||
raise GrammarError("Rules aren't allowed inside terminals (%s in %s)" % (item, name)) | |||
if item.type == 'TERMINAL': | |||
exp.children[0] = token_dict[item] | |||
term_value = term_dict[item] | |||
assert term_value is not None | |||
exp.children[0] = term_value | |||
changed = True | |||
if not changed: | |||
break | |||
for name, term in term_dict.items(): | |||
if term: # Not just declared | |||
for child in term.children: | |||
ids = [id(x) for x in child.iter_subtrees()] | |||
if id(term) in ids: | |||
raise GrammarError("Recursion in terminal '%s' (recursion is only allowed in rules, not terminals)" % name) | |||
def options_from_rule(name, *x): | |||
if len(x) > 1: | |||
priority, expansions = x | |||
@@ -10,10 +10,11 @@ is better documented here: | |||
http://www.bramvandersanden.com/post/2014/06/shared-packed-parse-forest/ | |||
""" | |||
import logging | |||
from collections import deque | |||
from ..visitors import Transformer_InPlace, v_args | |||
from ..exceptions import ParseError, UnexpectedToken | |||
from ..exceptions import UnexpectedEOF, UnexpectedToken | |||
from .grammar_analysis import GrammarAnalyzer | |||
from ..grammar import NonTerminal | |||
from .earley_common import Item, TransitiveItem | |||
@@ -45,12 +46,8 @@ class Parser: | |||
# skip the extra tree walk. We'll also skip this if the user just didn't specify priorities | |||
# on any rules. | |||
if self.forest_sum_visitor is None and rule.options and rule.options.priority is not None: | |||
self.forest_sum_visitor = ForestSumVisitor() | |||
self.forest_sum_visitor = ForestSumVisitor | |||
if resolve_ambiguity: | |||
self.forest_tree_visitor = ForestToTreeVisitor(self.callbacks, self.forest_sum_visitor) | |||
else: | |||
self.forest_tree_visitor = ForestToAmbiguousTreeVisitor(self.callbacks, self.forest_sum_visitor) | |||
self.term_matcher = term_matcher | |||
@@ -273,6 +270,7 @@ class Parser: | |||
## Column is now the final column in the parse. | |||
assert i == len(columns)-1 | |||
return to_scan | |||
def parse(self, stream, start): | |||
assert start, start | |||
@@ -291,7 +289,7 @@ class Parser: | |||
else: | |||
columns[0].add(item) | |||
self._parse(stream, columns, to_scan, start_symbol) | |||
to_scan = self._parse(stream, columns, to_scan, start_symbol) | |||
# If the parse was successful, the start | |||
# symbol should have been completed in the last step of the Earley cycle, and will be in | |||
@@ -299,18 +297,25 @@ class Parser: | |||
solutions = [n.node for n in columns[-1] if n.is_complete and n.node is not None and n.s == start_symbol and n.start == 0] | |||
if self.debug: | |||
from .earley_forest import ForestToPyDotVisitor | |||
debug_walker = ForestToPyDotVisitor() | |||
debug_walker.visit(solutions[0], "sppf.png") | |||
try: | |||
debug_walker = ForestToPyDotVisitor() | |||
except ImportError: | |||
logging.warning("Cannot find dependency 'pydot', will not generate sppf debug image") | |||
else: | |||
debug_walker.visit(solutions[0], "sppf.png") | |||
if not solutions: | |||
expected_tokens = [t.expect for t in to_scan] | |||
# raise ParseError('Incomplete parse: Could not find a solution to input') | |||
raise ParseError('Unexpected end of input! Expecting a terminal of: %s' % expected_tokens) | |||
raise UnexpectedEOF(expected_tokens) | |||
elif len(solutions) > 1: | |||
assert False, 'Earley should not generate multiple start symbol items!' | |||
# Perform our SPPF -> AST conversion using the right ForestVisitor. | |||
return self.forest_tree_visitor.visit(solutions[0]) | |||
forest_tree_visitor_cls = ForestToTreeVisitor if self.resolve_ambiguity else ForestToAmbiguousTreeVisitor | |||
forest_tree_visitor = forest_tree_visitor_cls(self.callbacks, self.forest_sum_visitor and self.forest_sum_visitor()) | |||
return forest_tree_visitor.visit(solutions[0]) | |||
class ApplyCallbacks(Transformer_InPlace): | |||
@@ -146,4 +146,5 @@ class Parser(BaseParser): | |||
self.predict_and_complete(i, to_scan, columns, transitives) | |||
## Column is now the final column in the parse. | |||
assert i == len(columns)-1 | |||
assert i == len(columns)-1 | |||
return to_scan |
@@ -3,6 +3,7 @@ from functools import wraps | |||
from .utils import smart_decorator | |||
from .tree import Tree | |||
from .exceptions import VisitError, GrammarError | |||
from .lexer import Token | |||
###{standalone | |||
from inspect import getmembers, getmro | |||
@@ -21,6 +22,10 @@ class Transformer: | |||
Can be used to implement map or reduce. | |||
""" | |||
__visit_tokens__ = False # For backwards compatibility | |||
def __init__(self, visit_tokens=False): | |||
self.__visit_tokens__ = visit_tokens | |||
def _call_userfunc(self, tree, new_children=None): | |||
# Assumes tree is already transformed | |||
children = new_children if new_children is not None else tree.children | |||
@@ -45,10 +50,29 @@ class Transformer: | |||
except Exception as e: | |||
raise VisitError(tree, e) | |||
def _call_userfunc_token(self, token): | |||
try: | |||
f = getattr(self, token.type) | |||
except AttributeError: | |||
return self.__default_token__(token) | |||
else: | |||
try: | |||
return f(token) | |||
except (GrammarError, Discard): | |||
raise | |||
except Exception as e: | |||
raise VisitError(token, e) | |||
def _transform_children(self, children): | |||
for c in children: | |||
try: | |||
yield self._transform_tree(c) if isinstance(c, Tree) else c | |||
if isinstance(c, Tree): | |||
yield self._transform_tree(c) | |||
elif self.__visit_tokens__ and isinstance(c, Token): | |||
yield self._call_userfunc_token(c) | |||
else: | |||
yield c | |||
except Discard: | |||
pass | |||
@@ -66,6 +90,11 @@ class Transformer: | |||
"Default operation on tree (for override)" | |||
return Tree(data, children, meta) | |||
def __default_token__(self, token): | |||
"Default operation on token (for override)" | |||
return token | |||
@classmethod | |||
def _apply_decorator(cls, decorator, **kwargs): | |||
mro = getmro(cls) | |||
@@ -157,6 +186,11 @@ class Visitor(VisitorBase): | |||
self._call_userfunc(subtree) | |||
return tree | |||
def visit_topdown(self,tree): | |||
for subtree in tree.iter_subtrees_topdown(): | |||
self._call_userfunc(subtree) | |||
return tree | |||
class Visitor_Recursive(VisitorBase): | |||
"""Bottom-up visitor, recursive | |||
@@ -169,8 +203,16 @@ class Visitor_Recursive(VisitorBase): | |||
if isinstance(child, Tree): | |||
self.visit(child) | |||
f = getattr(self, tree.data, self.__default__) | |||
f(tree) | |||
self._call_userfunc(tree) | |||
return tree | |||
def visit_topdown(self,tree): | |||
self._call_userfunc(tree) | |||
for child in tree.children: | |||
if isinstance(child, Tree): | |||
self.visit_topdown(child) | |||
return tree | |||
@@ -9,5 +9,6 @@ pages: | |||
- How To Develop (Guide): how_to_develop.md | |||
- Grammar Reference: grammar.md | |||
- Tree Construction Reference: tree_construction.md | |||
- Visitors and Transformers: visitors.md | |||
- Classes Reference: classes.md | |||
- Recipes: recipes.md |
@@ -10,7 +10,7 @@ from .test_reconstructor import TestReconstructor | |||
try: | |||
from .test_nearley.test_nearley import TestNearley | |||
except ImportError: | |||
pass | |||
logging.warn("Warning: Skipping tests for Nearley (js2py required)") | |||
# from .test_selectors import TestSelectors | |||
# from .test_grammars import TestPythonG, TestConfigG | |||
@@ -15,9 +15,12 @@ NEARLEY_PATH = os.path.join(TEST_PATH, 'nearley') | |||
BUILTIN_PATH = os.path.join(NEARLEY_PATH, 'builtin') | |||
if not os.path.exists(NEARLEY_PATH): | |||
print("Skipping Nearley tests!") | |||
logging.warn("Nearley not installed. Skipping Nearley tests!") | |||
raise ImportError("Skipping Nearley tests!") | |||
import js2py # Ensures that js2py exists, to avoid failing tests | |||
class TestNearley(unittest.TestCase): | |||
def test_css(self): | |||
fn = os.path.join(NEARLEY_PATH, 'examples/csscolor.ne') | |||
@@ -94,6 +94,24 @@ class TestParsers(unittest.TestCase): | |||
r = g.parse('xx') | |||
self.assertEqual( r.children[0].data, "c" ) | |||
def test_visit_tokens(self): | |||
class T(Transformer): | |||
def a(self, children): | |||
return children[0] + "!" | |||
def A(self, tok): | |||
return tok.upper() | |||
# Test regular | |||
g = Lark("""start: a | |||
a : A | |||
A: "x" | |||
""", parser='lalr') | |||
r = T().transform(g.parse("x")) | |||
self.assertEqual( r.children, ["x!"] ) | |||
r = T(True).transform(g.parse("x")) | |||
self.assertEqual( r.children, ["X!"] ) | |||
def test_embedded_transformer(self): | |||
class T(Transformer): | |||
def a(self, children): | |||
@@ -7,7 +7,7 @@ import pickle | |||
import functools | |||
from lark.tree import Tree | |||
from lark.visitors import Transformer, Interpreter, visit_children_decor, v_args, Discard | |||
from lark.visitors import Visitor, Visitor_Recursive, Transformer, Interpreter, visit_children_decor, v_args, Discard | |||
class TestTrees(TestCase): | |||
@@ -34,6 +34,43 @@ class TestTrees(TestCase): | |||
nodes = list(self.tree1.iter_subtrees_topdown()) | |||
self.assertEqual(nodes, expected) | |||
def test_visitor(self): | |||
class Visitor1(Visitor): | |||
def __init__(self): | |||
self.nodes=[] | |||
def __default__(self,tree): | |||
self.nodes.append(tree) | |||
class Visitor1_Recursive(Visitor_Recursive): | |||
def __init__(self): | |||
self.nodes=[] | |||
def __default__(self,tree): | |||
self.nodes.append(tree) | |||
visitor1=Visitor1() | |||
visitor1_recursive=Visitor1_Recursive() | |||
expected_top_down = [Tree('a', [Tree('b', 'x'), Tree('c', 'y'), Tree('d', 'z')]), | |||
Tree('b', 'x'), Tree('c', 'y'), Tree('d', 'z')] | |||
expected_botton_up= [Tree('b', 'x'), Tree('c', 'y'), Tree('d', 'z'), | |||
Tree('a', [Tree('b', 'x'), Tree('c', 'y'), Tree('d', 'z')])] | |||
visitor1.visit(self.tree1) | |||
self.assertEqual(visitor1.nodes,expected_botton_up) | |||
visitor1_recursive.visit(self.tree1) | |||
self.assertEqual(visitor1_recursive.nodes,expected_botton_up) | |||
visitor1.nodes=[] | |||
visitor1_recursive.nodes=[] | |||
visitor1.visit_topdown(self.tree1) | |||
self.assertEqual(visitor1.nodes,expected_top_down) | |||
visitor1_recursive.visit_topdown(self.tree1) | |||
self.assertEqual(visitor1_recursive.nodes,expected_top_down) | |||
def test_interp(self): | |||
t = Tree('a', [Tree('b', []), Tree('c', []), 'd']) | |||