Browse Source

Merge branch 'master' into true_lalr3

tags/gm/2021-09-23T00Z/github.com--lark-parser-lark/0.8.0
Erez Sh 5 years ago
parent
commit
7c5917ba19
20 changed files with 410 additions and 180 deletions
  1. +12
    -106
      docs/classes.md
  2. +58
    -1
      docs/grammar.md
  3. +2
    -2
      docs/how_to_develop.md
  4. +3
    -2
      docs/index.md
  5. +10
    -5
      docs/json_tutorial.md
  6. +3
    -3
      docs/parsers.md
  7. +117
    -0
      docs/visitors.md
  8. +1
    -1
      lark/__init__.py
  9. +8
    -0
      lark/exceptions.py
  10. +6
    -1
      lark/lark.py
  11. +48
    -34
      lark/lexer.py
  12. +16
    -6
      lark/load_grammar.py
  13. +17
    -12
      lark/parsers/earley.py
  14. +2
    -1
      lark/parsers/xearley.py
  15. +45
    -3
      lark/visitors.py
  16. +1
    -0
      mkdocs.yml
  17. +1
    -1
      tests/__main__.py
  18. +4
    -1
      tests/test_nearley/test_nearley.py
  19. +18
    -0
      tests/test_parser.py
  20. +38
    -1
      tests/test_trees.py

+ 12
- 106
docs/classes.md View File

@@ -1,15 +1,13 @@
# Classes - Reference
# Classes Reference

This page details the important classes in Lark.

----

## Lark
## lark.Lark

The Lark class is the main interface for the library. It's mostly a thin wrapper for the many different parsers, and for the tree constructor.

### Methods

#### \_\_init\_\_(self, grammar, **options)

The Lark class accepts a grammar string or file object, and keyword options:
@@ -50,14 +48,10 @@ If a transformer is supplied to `__init__`, returns whatever is the result of th

The main tree class

### Properties

* `data` - The name of the rule or alias
* `children` - List of matched sub-rules and terminals
* `meta` - Line & Column numbers, if using `propagate_positions`

### Methods

#### \_\_init\_\_(self, data, children)

Creates a new tree, and stores "data" and "children" in attributes of the same name.
@@ -92,102 +86,6 @@ Trees can be hashed and compared.

----

## Transformers & Visitors

Transformers & Visitors provide a convenient interface to process the parse-trees that Lark returns.

They are used by inheriting from the correct class (visitor or transformer), and implementing methods corresponding to the rule you wish to process. Each methods accepts the children as an argument. That can be modified using the `v-args` decorator, which allows to inline the arguments (akin to `*args`), or add the tree `meta` property as an argument.

See: https://github.com/lark-parser/lark/blob/master/lark/visitors.py

### Visitors

Visitors visit each node of the tree, and run the appropriate method on it according to the node's data.

They work bottom-up, starting with the leaves and ending at the root of the tree.

**Example**
```python
class IncreaseAllNumbers(Visitor):
def number(self, tree):
assert tree.data == "number"
tree.children[0] += 1

IncreaseAllNumbers().visit(parse_tree)
```

There are two classes that implement the visitor interface:

* Visitor - Visit every node (without recursion)

* Visitor_Recursive - Visit every node using recursion. Slightly faster.

### Transformers

Transformers visit each node of the tree, and run the appropriate method on it according to the node's data.

They work bottom-up (or: depth-first), starting with the leaves and ending at the root of the tree.

Transformers can be used to implement map & reduce patterns.

Because nodes are reduced from leaf to root, at any point the callbacks may assume the children have already been transformed (if applicable).

Transformers can be chained into a new transformer by using multiplication.

**Example:**
```python
from lark import Tree, Transformer

class EvalExpressions(Transformer):
def expr(self, args):
return eval(args[0])

t = Tree('a', [Tree('expr', ['1+2'])])
print(EvalExpressions().transform( t ))

# Prints: Tree(a, [3])
```


Here are the classes that implement the transformer interface:

- Transformer - Recursively transforms the tree. This is the one you probably want.
- Transformer_InPlace - Non-recursive. Changes the tree in-place instead of returning new instances
- Transformer_InPlaceRecursive - Recursive. Changes the tree in-place instead of returning new instances

### v_args

`v_args` is a decorator.

By default, callback methods of transformers/visitors accept one argument: a list of the node's children. `v_args` can modify this behavior.

When used on a transformer/visitor class definition, it applies to all the callback methods inside it.

`v_args` accepts one of three flags:

- `inline` - Children are provided as `*args` instead of a list argument (not recommended for very long lists).
- `meta` - Provides two arguments: `children` and `meta` (instead of just the first)
- `tree` - Provides the entire tree as the argument, instead of the children.

Examples:

```python
@v_args(inline=True)
class SolveArith(Transformer):
def add(self, left, right):
return left + right


class ReverseNotation(Transformer_InPlace):
@v_args(tree=True):
def tree_node(self, tree):
tree.children = tree.children[::-1]
```

### Discard

When raising the `Discard` exception in a transformer callback, that node is discarded and won't appear in the parent.

## Token

When using a lexer, the resulting tokens in the trees will be of the Token class, which inherits from Python's string. So, normal string comparisons and operations will work as expected. Tokens also have other useful attributes:
@@ -199,17 +97,25 @@ When using a lexer, the resulting tokens in the trees will be of the Token class
* `end_line` - The line where the token ends
* `end_column` - The next column after the end of the token. For example, if the token is a single character with a `column` value of 4, `end_column` will be 5.

## Transformer
## Visitor
## Interpreter

See the [visitors page](visitors.md)


## UnexpectedInput

## UnexpectedToken

## UnexpectedException

- `UnexpectedInput`
- `UnexpectedToken` - The parser recieved an unexpected token
- `UnexpectedCharacters` - The lexer encountered an unexpected string

After catching one of these exceptions, you may call the following helper methods to create a nicer error message:

### Methods

#### get_context(text, span)

Returns a pretty string pinpointing the error in the text, with `span` amount of context characters around it.


+ 58
- 1
docs/grammar.md View File

@@ -1,5 +1,13 @@
# Grammar Reference

Table of contents:

1. [Definitions](#defs)
1. [Terminals](#terms)
1. [Rules](#rules)
1. [Directives](#dirs)

<a name="defs"></a>
## Definitions

**A grammar** is a list of rules and terminals, that together define a language.
@@ -25,6 +33,7 @@ Lark begins the parse with the rule 'start', unless specified otherwise in the o
Names of rules are always in lowercase, while names of terminals are always in uppercase. This distinction has practical effects, for the shape of the generated parse-tree, and the automatic construction of the lexer (aka tokenizer, or scanner).


<a name="terms"></a>
## Terminals

Terminals are used to match text into symbols. They can be defined as a combination of literals and other terminals.
@@ -70,6 +79,53 @@ WHITESPACE: (" " | /\t/ )+
SQL_SELECT: "select"i
```

### Regular expressions & Ambiguity

Each terminal is eventually compiled to a regular expression. All the operators and references inside it are mapped to their respective expressions.

For example, in the following grammar, `A1` and `A2`, are equivalent:
```perl
A1: "a" | "b"
A2: /a|b/
```

This means that inside terminals, Lark cannot detect or resolve ambiguity, even when using Earley.

For example, for this grammar:
```perl
start : (A | B)+
A : "a" | "ab"
B : "b"
```
We get this behavior:

```bash
>>> p.parse("ab")
Tree(start, [Token(A, 'a'), Token(B, 'b')])
```

This is happening because Python's regex engine always returns the first matching option.

If you find yourself in this situation, the recommended solution is to use rules instead.

Example:

```python
>>> p = Lark("""start: (a | b)+
... !a: "a" | "ab"
... !b: "b"
... """, ambiguity="explicit")
>>> print(p.parse("ab").pretty())
_ambig
start
a ab
start
a a
b b
```


<a name="rules"></a>
## Rules

**Syntax:**
@@ -114,6 +170,7 @@ Rules can be assigned priority only when using Earley (future versions may suppo

Priority can be either positive or negative. In not specified for a terminal, it's assumed to be 1 (i.e. the default).

<a name="dirs"></a>
## Directives

### %ignore
@@ -122,7 +179,7 @@ All occurrences of the terminal will be ignored, and won't be part of the parse.

Using the `%ignore` directive results in a cleaner grammar.

It's especially important for the LALR(1) algorithm, because adding whitespace (or comments, or other extranous elements) explicitly in the grammar, harms its predictive abilities, which are based on a lookahead of 1.
It's especially important for the LALR(1) algorithm, because adding whitespace (or comments, or other extraneous elements) explicitly in the grammar, harms its predictive abilities, which are based on a lookahead of 1.

**Syntax:**
```html


+ 2
- 2
docs/how_to_develop.md View File

@@ -7,7 +7,7 @@ There are many ways you can help the project:
* Write new grammars for Lark's library
* Write a blog post introducing Lark to your audience
* Port Lark to another language
* Help me with code developemnt
* Help me with code development

If you're interested in taking one of these on, let me know and I will provide more details and assist you in the process.

@@ -60,4 +60,4 @@ Another way to run the tests is using setup.py:

```bash
python setup.py test
```
```

+ 3
- 2
docs/index.md View File

@@ -35,8 +35,8 @@ $ pip install lark-parser
* [Examples](https://github.com/lark-parser/lark/tree/master/examples)
* Tutorials
* [How to write a DSL](http://blog.erezsh.com/how-to-write-a-dsl-in-python-with-lark/) - Implements a toy LOGO-like language with an interpreter
* [How to write a JSON parser](json_tutorial.md)
* External
* [How to write a JSON parser](json_tutorial.md) - Teaches you how to use Lark
* Unofficial
* [Program Synthesis is Possible](https://www.cs.cornell.edu/~asampson/blog/minisynth.html) - Creates a DSL for Z3
* Guides
* [How to use Lark](how_to_use.md)
@@ -44,6 +44,7 @@ $ pip install lark-parser
* Reference
* [Grammar](grammar.md)
* [Tree Construction](tree_construction.md)
* [Visitors & Transformers](visitors.md)
* [Classes](classes.md)
* [Cheatsheet (PDF)](lark_cheatsheet.pdf)
* Discussion


+ 10
- 5
docs/json_tutorial.md View File

@@ -230,7 +230,8 @@ from lark import Transformer
class MyTransformer(Transformer):
def list(self, items):
return list(items)
def pair(self, (k,v)):
def pair(self, key_value):
k, v = key_value
return k, v
def dict(self, items):
return dict(items)
@@ -251,9 +252,11 @@ Also, our definitions of list and dict are a bit verbose. We can do better:
from lark import Transformer

class TreeToJson(Transformer):
def string(self, (s,)):
def string(self, s):
(s,) = s
return s[1:-1]
def number(self, (n,)):
def number(self, n):
(n,) = n
return float(n)

list = list
@@ -315,9 +318,11 @@ json_grammar = r"""
"""

class TreeToJson(Transformer):
def string(self, (s,)):
def string(self, s):
(s,) = s
return s[1:-1]
def number(self, (n,)):
def number(self, n):
(n,) = n
return float(n)

list = list


+ 3
- 3
docs/parsers.md View File

@@ -5,9 +5,9 @@ Lark implements the following parsing algorithms: Earley, LALR(1), and CYK

An [Earley Parser](https://www.wikiwand.com/en/Earley_parser) is a chart parser capable of parsing any context-free grammar at O(n^3), and O(n^2) when the grammar is unambiguous. It can parse most LR grammars at O(n). Most programming languages are LR, and can be parsed at a linear time.

Lark's Earley implementation runs on top of a skipping chart parser, which allows it to use regular expressions, instead of matching characters one-by-one. This is a huge improvement to Earley that is unique to Lark. This feature is used by default, but can also be requested explicitely using `lexer='dynamic'`.
Lark's Earley implementation runs on top of a skipping chart parser, which allows it to use regular expressions, instead of matching characters one-by-one. This is a huge improvement to Earley that is unique to Lark. This feature is used by default, but can also be requested explicitly using `lexer='dynamic'`.

It's possible to bypass the dynamic lexing, and use the regular Earley parser with a traditional lexer, that tokenizes as an independant first step. Doing so will provide a speed benefit, but will tokenize without using Earley's ambiguity-resolution ability. So choose this only if you know why! Activate with `lexer='standard'`
It's possible to bypass the dynamic lexing, and use the regular Earley parser with a traditional lexer, that tokenizes as an independent first step. Doing so will provide a speed benefit, but will tokenize without using Earley's ambiguity-resolution ability. So choose this only if you know why! Activate with `lexer='standard'`

**SPPF & Ambiguity resolution**

@@ -21,7 +21,7 @@ Lark provides the following options to combat ambiguity:

1) Lark will choose the best derivation for you (default). Users can choose between different disambiguation strategies, and can prioritize (or demote) individual rules over others, using the rule-priority syntax.

2) Users may choose to recieve the set of all possible parse-trees (using ambiguity='explicit'), and choose the best derivation themselves. While simple and flexible, it comes at the cost of space and performance, and so it isn't recommended for highly ambiguous grammars, or very long inputs.
2) Users may choose to receive the set of all possible parse-trees (using ambiguity='explicit'), and choose the best derivation themselves. While simple and flexible, it comes at the cost of space and performance, and so it isn't recommended for highly ambiguous grammars, or very long inputs.

3) As an advanced feature, users may use specialized visitors to iterate the SPPF themselves. Future versions of Lark intend to improve and simplify this interface.



+ 117
- 0
docs/visitors.md View File

@@ -0,0 +1,117 @@
## Transformers & Visitors

Transformers & Visitors provide a convenient interface to process the parse-trees that Lark returns.

They are used by inheriting from the correct class (visitor or transformer), and implementing methods corresponding to the rule you wish to process. Each method accepts the children as an argument. That can be modified using the `v_args` decorator, which allows to inline the arguments (akin to `*args`), or add the tree `meta` property as an argument.

See: <a href="https://github.com/lark-parser/lark/blob/master/lark/visitors.py">visitors.py</a>

### Visitors

Visitors visit each node of the tree, and run the appropriate method on it according to the node's data.

They work bottom-up, starting with the leaves and ending at the root of the tree.

**Example**
```python
class IncreaseAllNumbers(Visitor):
def number(self, tree):
assert tree.data == "number"
tree.children[0] += 1

IncreaseAllNumbers().visit(parse_tree)
```

There are two classes that implement the visitor interface:

* Visitor - Visit every node (without recursion)

* Visitor_Recursive - Visit every node using recursion. Slightly faster.

### Transformers

Transformers visit each node of the tree, and run the appropriate method on it according to the node's data.

They work bottom-up (or: depth-first), starting with the leaves and ending at the root of the tree.

Transformers can be used to implement map & reduce patterns.

Because nodes are reduced from leaf to root, at any point the callbacks may assume the children have already been transformed (if applicable).

Transformers can be chained into a new transformer by using multiplication.

`Transformer` can do anything `Visitor` can do, but because it reconstructs the tree, it is slightly less efficient.


**Example:**
```python
from lark import Tree, Transformer

class EvalExpressions(Transformer):
def expr(self, args):
return eval(args[0])

t = Tree('a', [Tree('expr', ['1+2'])])
print(EvalExpressions().transform( t ))

# Prints: Tree(a, [3])
```

All these classes implement the transformer interface:

- Transformer - Recursively transforms the tree. This is the one you probably want.
- Transformer_InPlace - Non-recursive. Changes the tree in-place instead of returning new instances
- Transformer_InPlaceRecursive - Recursive. Changes the tree in-place instead of returning new instances

### visit_tokens

By default, transformers only visit rules. `visit_tokens=True` will tell Transformer to visit tokens as well. This is a slightly slower alternative to `lexer_callbacks`, but it's easier to maintain and works for all algorithms (even when there isn't a lexer).

Example:

```python
class T(Transformer):
INT = int
NUMBER = float
def NAME(self, name):
return lookup_dict.get(name, name)


T(visit_tokens=True).transform(tree)
```


### v_args

`v_args` is a decorator.

By default, callback methods of transformers/visitors accept one argument: a list of the node's children. `v_args` can modify this behavior.

When used on a transformer/visitor class definition, it applies to all the callback methods inside it.

`v_args` accepts one of three flags:

- `inline` - Children are provided as `*args` instead of a list argument (not recommended for very long lists).
- `meta` - Provides two arguments: `children` and `meta` (instead of just the first)
- `tree` - Provides the entire tree as the argument, instead of the children.

Examples:

```python
@v_args(inline=True)
class SolveArith(Transformer):
def add(self, left, right):
return left + right


class ReverseNotation(Transformer_InPlace):
@v_args(tree=True):
def tree_node(self, tree):
tree.children = tree.children[::-1]
```

### Discard

When raising the `Discard` exception in a transformer callback, that node is discarded and won't appear in the parent.



+ 1
- 1
lark/__init__.py View File

@@ -5,4 +5,4 @@ from .exceptions import ParseError, LexError, GrammarError, UnexpectedToken, Une
from .lexer import Token
from .lark import Lark

__version__ = "0.7.4"
__version__ = "0.8.0rc1"

+ 8
- 0
lark/exceptions.py View File

@@ -13,6 +13,14 @@ class ParseError(LarkError):
class LexError(LarkError):
pass

class UnexpectedEOF(ParseError):
def __init__(self, expected):
self.expected = expected

message = ("Unexpected end-of-input. Expected one of: \n\t* %s\n" % '\n\t* '.join(x.name for x in self.expected))
super(UnexpectedEOF, self).__init__(message)


class UnexpectedInput(LarkError):
pos_in_stream = None



+ 6
- 1
lark/lark.py View File

@@ -69,6 +69,7 @@ class LarkOptions(Serialize):
'propagate_positions': False,
'lexer_callbacks': {},
'maybe_placeholders': False,
'edit_terminals': None,
}

def __init__(self, options_dict):
@@ -85,7 +86,7 @@ class LarkOptions(Serialize):

options[name] = value

if isinstance(options['start'], str):
if isinstance(options['start'], STRING_TYPE):
options['start'] = [options['start']]

self.__dict__['options'] = options
@@ -205,6 +206,10 @@ class Lark(Serialize):
# Compile the EBNF grammar into BNF
self.terminals, self.rules, self.ignore_tokens = self.grammar.compile(self.options.start)

if self.options.edit_terminals:
for t in self.terminals:
self.options.edit_terminals(t)

self._terminals_dict = {t.name:t for t in self.terminals}

# If the user asked to invert the priorities, negate them all here.


+ 48
- 34
lark/lexer.py View File

@@ -3,7 +3,7 @@
import re

from .utils import Str, classify, get_regexp_width, Py36, Serialize
from .exceptions import UnexpectedCharacters, LexError
from .exceptions import UnexpectedCharacters, LexError, UnexpectedToken

###{standalone

@@ -43,7 +43,7 @@ class PatternStr(Pattern):
__serialize_fields__ = 'value', 'flags'

type = "str"
def to_regexp(self):
return self._get_flags(re.escape(self.value))

@@ -166,36 +166,33 @@ class _Lex:

while line_ctr.char_pos < len(stream):
lexer = self.lexer
for mre, type_from_index in lexer.mres:
m = mre.match(stream, line_ctr.char_pos)
if not m:
continue

t = None
value = m.group(0)
type_ = type_from_index[m.lastindex]
if type_ not in ignore_types:
t = Token(type_, value, line_ctr.char_pos, line_ctr.line, line_ctr.column)
if t.type in lexer.callback:
t = lexer.callback[t.type](t)
if not isinstance(t, Token):
raise ValueError("Callbacks must return a token (returned %r)" % t)
last_token = t
yield t
else:
if type_ in lexer.callback:
t = Token(type_, value, line_ctr.char_pos, line_ctr.line, line_ctr.column)
lexer.callback[type_](t)

line_ctr.feed(value, type_ in newline_types)
if t:
t.end_line = line_ctr.line
t.end_column = line_ctr.column
res = lexer.match(stream, line_ctr.char_pos)
if not res:
allowed = {v for m, tfi in lexer.mres for v in tfi.values()} - ignore_types
if not allowed:
allowed = {"<END-OF-FILE>"}
raise UnexpectedCharacters(stream, line_ctr.char_pos, line_ctr.line, line_ctr.column, allowed=allowed, state=self.state, token_history=last_token and [last_token])

break
value, type_ = res

t = None
if type_ not in ignore_types:
t = Token(type_, value, line_ctr.char_pos, line_ctr.line, line_ctr.column)
if t.type in lexer.callback:
t = lexer.callback[t.type](t)
if not isinstance(t, Token):
raise ValueError("Callbacks must return a token (returned %r)" % t)
last_token = t
yield t
else:
allowed = {v for m, tfi in lexer.mres for v in tfi.values()}
raise UnexpectedCharacters(stream, line_ctr.char_pos, line_ctr.line, line_ctr.column, allowed=allowed, state=self.state, token_history=last_token and [last_token])
if type_ in lexer.callback:
t = Token(type_, value, line_ctr.char_pos, line_ctr.line, line_ctr.column)
lexer.callback[type_](t)

line_ctr.feed(value, type_ in newline_types)
if t:
t.end_line = line_ctr.line
t.end_column = line_ctr.column


class UnlessCallback:
@@ -330,6 +327,11 @@ class TraditionalLexer(Lexer):

self.mres = build_mres(terminals)

def match(self, stream, pos):
for mre, type_from_index in self.mres:
m = mre.match(stream, pos)
if m:
return m.group(0), type_from_index[m.lastindex]

def lex(self, stream):
return _Lex(self).lex(stream, self.newline_types, self.ignore_types)
@@ -367,9 +369,21 @@ class ContextualLexer(Lexer):

def lex(self, stream):
l = _Lex(self.lexers[self.parser_state], self.parser_state)
for x in l.lex(stream, self.root_lexer.newline_types, self.root_lexer.ignore_types):
yield x
l.lexer = self.lexers[self.parser_state]
l.state = self.parser_state
try:
for x in l.lex(stream, self.root_lexer.newline_types, self.root_lexer.ignore_types):
yield x
l.lexer = self.lexers[self.parser_state]
l.state = self.parser_state
except UnexpectedCharacters as e:
# In the contextual lexer, UnexpectedCharacters can mean that the terminal is defined,
# but not in the current context.
# This tests the input against the global context, to provide a nicer error.
root_match = self.root_lexer.match(stream, e.pos_in_stream)
if not root_match:
raise

value, type_ = root_match
t = Token(type_, value, e.pos_in_stream, e.line, e.column)
raise UnexpectedToken(t, e.allowed, state=e.state)

###}

+ 16
- 6
lark/load_grammar.py View File

@@ -479,7 +479,7 @@ class Grammar:
# ===================

# Convert terminal-trees to strings/regexps
transformer = PrepareLiterals() * TerminalTreeToPattern()
for name, (term_tree, priority) in term_defs:
if term_tree is None: # Terminal added through %declare
continue
@@ -487,7 +487,8 @@ class Grammar:
if len(expansions) == 1 and not expansions[0].children:
raise GrammarError("Terminals cannot be empty (%s)" % name)

terminals = [TerminalDef(name, transformer.transform(term_tree), priority)
transformer = PrepareLiterals() * TerminalTreeToPattern()
terminals = [TerminalDef(name, transformer.transform( term_tree ), priority)
for name, (term_tree, priority) in term_defs if term_tree]

# =================
@@ -638,11 +639,10 @@ def import_from_grammar_into_namespace(grammar, namespace, aliases):


def resolve_term_references(term_defs):
# TODO Cycles detection
# TODO Solve with transitive closure (maybe)

token_dict = {k:t for k, (t,_p) in term_defs}
assert len(token_dict) == len(term_defs), "Same name defined twice?"
term_dict = {k:t for k, (t,_p) in term_defs}
assert len(term_dict) == len(term_defs), "Same name defined twice?"

while True:
changed = False
@@ -655,11 +655,21 @@ def resolve_term_references(term_defs):
if item.type == 'RULE':
raise GrammarError("Rules aren't allowed inside terminals (%s in %s)" % (item, name))
if item.type == 'TERMINAL':
exp.children[0] = token_dict[item]
term_value = term_dict[item]
assert term_value is not None
exp.children[0] = term_value
changed = True
if not changed:
break

for name, term in term_dict.items():
if term: # Not just declared
for child in term.children:
ids = [id(x) for x in child.iter_subtrees()]
if id(term) in ids:
raise GrammarError("Recursion in terminal '%s' (recursion is only allowed in rules, not terminals)" % name)


def options_from_rule(name, *x):
if len(x) > 1:
priority, expansions = x


+ 17
- 12
lark/parsers/earley.py View File

@@ -10,10 +10,11 @@ is better documented here:
http://www.bramvandersanden.com/post/2014/06/shared-packed-parse-forest/
"""

import logging
from collections import deque

from ..visitors import Transformer_InPlace, v_args
from ..exceptions import ParseError, UnexpectedToken
from ..exceptions import UnexpectedEOF, UnexpectedToken
from .grammar_analysis import GrammarAnalyzer
from ..grammar import NonTerminal
from .earley_common import Item, TransitiveItem
@@ -45,12 +46,8 @@ class Parser:
# skip the extra tree walk. We'll also skip this if the user just didn't specify priorities
# on any rules.
if self.forest_sum_visitor is None and rule.options and rule.options.priority is not None:
self.forest_sum_visitor = ForestSumVisitor()
self.forest_sum_visitor = ForestSumVisitor

if resolve_ambiguity:
self.forest_tree_visitor = ForestToTreeVisitor(self.callbacks, self.forest_sum_visitor)
else:
self.forest_tree_visitor = ForestToAmbiguousTreeVisitor(self.callbacks, self.forest_sum_visitor)
self.term_matcher = term_matcher


@@ -273,6 +270,7 @@ class Parser:

## Column is now the final column in the parse.
assert i == len(columns)-1
return to_scan

def parse(self, stream, start):
assert start, start
@@ -291,7 +289,7 @@ class Parser:
else:
columns[0].add(item)

self._parse(stream, columns, to_scan, start_symbol)
to_scan = self._parse(stream, columns, to_scan, start_symbol)

# If the parse was successful, the start
# symbol should have been completed in the last step of the Earley cycle, and will be in
@@ -299,18 +297,25 @@ class Parser:
solutions = [n.node for n in columns[-1] if n.is_complete and n.node is not None and n.s == start_symbol and n.start == 0]
if self.debug:
from .earley_forest import ForestToPyDotVisitor
debug_walker = ForestToPyDotVisitor()
debug_walker.visit(solutions[0], "sppf.png")
try:
debug_walker = ForestToPyDotVisitor()
except ImportError:
logging.warning("Cannot find dependency 'pydot', will not generate sppf debug image")
else:
debug_walker.visit(solutions[0], "sppf.png")


if not solutions:
expected_tokens = [t.expect for t in to_scan]
# raise ParseError('Incomplete parse: Could not find a solution to input')
raise ParseError('Unexpected end of input! Expecting a terminal of: %s' % expected_tokens)
raise UnexpectedEOF(expected_tokens)
elif len(solutions) > 1:
assert False, 'Earley should not generate multiple start symbol items!'

# Perform our SPPF -> AST conversion using the right ForestVisitor.
return self.forest_tree_visitor.visit(solutions[0])
forest_tree_visitor_cls = ForestToTreeVisitor if self.resolve_ambiguity else ForestToAmbiguousTreeVisitor
forest_tree_visitor = forest_tree_visitor_cls(self.callbacks, self.forest_sum_visitor and self.forest_sum_visitor())

return forest_tree_visitor.visit(solutions[0])


class ApplyCallbacks(Transformer_InPlace):


+ 2
- 1
lark/parsers/xearley.py View File

@@ -146,4 +146,5 @@ class Parser(BaseParser):
self.predict_and_complete(i, to_scan, columns, transitives)

## Column is now the final column in the parse.
assert i == len(columns)-1
assert i == len(columns)-1
return to_scan

+ 45
- 3
lark/visitors.py View File

@@ -3,6 +3,7 @@ from functools import wraps
from .utils import smart_decorator
from .tree import Tree
from .exceptions import VisitError, GrammarError
from .lexer import Token

###{standalone
from inspect import getmembers, getmro
@@ -21,6 +22,10 @@ class Transformer:
Can be used to implement map or reduce.
"""

__visit_tokens__ = False # For backwards compatibility
def __init__(self, visit_tokens=False):
self.__visit_tokens__ = visit_tokens

def _call_userfunc(self, tree, new_children=None):
# Assumes tree is already transformed
children = new_children if new_children is not None else tree.children
@@ -45,10 +50,29 @@ class Transformer:
except Exception as e:
raise VisitError(tree, e)

def _call_userfunc_token(self, token):
try:
f = getattr(self, token.type)
except AttributeError:
return self.__default_token__(token)
else:
try:
return f(token)
except (GrammarError, Discard):
raise
except Exception as e:
raise VisitError(token, e)


def _transform_children(self, children):
for c in children:
try:
yield self._transform_tree(c) if isinstance(c, Tree) else c
if isinstance(c, Tree):
yield self._transform_tree(c)
elif self.__visit_tokens__ and isinstance(c, Token):
yield self._call_userfunc_token(c)
else:
yield c
except Discard:
pass

@@ -66,6 +90,11 @@ class Transformer:
"Default operation on tree (for override)"
return Tree(data, children, meta)

def __default_token__(self, token):
"Default operation on token (for override)"
return token


@classmethod
def _apply_decorator(cls, decorator, **kwargs):
mro = getmro(cls)
@@ -157,6 +186,11 @@ class Visitor(VisitorBase):
self._call_userfunc(subtree)
return tree

def visit_topdown(self,tree):
for subtree in tree.iter_subtrees_topdown():
self._call_userfunc(subtree)
return tree

class Visitor_Recursive(VisitorBase):
"""Bottom-up visitor, recursive

@@ -169,8 +203,16 @@ class Visitor_Recursive(VisitorBase):
if isinstance(child, Tree):
self.visit(child)

f = getattr(self, tree.data, self.__default__)
f(tree)
self._call_userfunc(tree)
return tree

def visit_topdown(self,tree):
self._call_userfunc(tree)

for child in tree.children:
if isinstance(child, Tree):
self.visit_topdown(child)
return tree




+ 1
- 0
mkdocs.yml View File

@@ -9,5 +9,6 @@ pages:
- How To Develop (Guide): how_to_develop.md
- Grammar Reference: grammar.md
- Tree Construction Reference: tree_construction.md
- Visitors and Transformers: visitors.md
- Classes Reference: classes.md
- Recipes: recipes.md

+ 1
- 1
tests/__main__.py View File

@@ -10,7 +10,7 @@ from .test_reconstructor import TestReconstructor
try:
from .test_nearley.test_nearley import TestNearley
except ImportError:
pass
logging.warn("Warning: Skipping tests for Nearley (js2py required)")

# from .test_selectors import TestSelectors
# from .test_grammars import TestPythonG, TestConfigG


+ 4
- 1
tests/test_nearley/test_nearley.py View File

@@ -15,9 +15,12 @@ NEARLEY_PATH = os.path.join(TEST_PATH, 'nearley')
BUILTIN_PATH = os.path.join(NEARLEY_PATH, 'builtin')

if not os.path.exists(NEARLEY_PATH):
print("Skipping Nearley tests!")
logging.warn("Nearley not installed. Skipping Nearley tests!")
raise ImportError("Skipping Nearley tests!")

import js2py # Ensures that js2py exists, to avoid failing tests


class TestNearley(unittest.TestCase):
def test_css(self):
fn = os.path.join(NEARLEY_PATH, 'examples/csscolor.ne')


+ 18
- 0
tests/test_parser.py View File

@@ -94,6 +94,24 @@ class TestParsers(unittest.TestCase):
r = g.parse('xx')
self.assertEqual( r.children[0].data, "c" )

def test_visit_tokens(self):
class T(Transformer):
def a(self, children):
return children[0] + "!"
def A(self, tok):
return tok.upper()

# Test regular
g = Lark("""start: a
a : A
A: "x"
""", parser='lalr')
r = T().transform(g.parse("x"))
self.assertEqual( r.children, ["x!"] )
r = T(True).transform(g.parse("x"))
self.assertEqual( r.children, ["X!"] )


def test_embedded_transformer(self):
class T(Transformer):
def a(self, children):


+ 38
- 1
tests/test_trees.py View File

@@ -7,7 +7,7 @@ import pickle
import functools

from lark.tree import Tree
from lark.visitors import Transformer, Interpreter, visit_children_decor, v_args, Discard
from lark.visitors import Visitor, Visitor_Recursive, Transformer, Interpreter, visit_children_decor, v_args, Discard


class TestTrees(TestCase):
@@ -34,6 +34,43 @@ class TestTrees(TestCase):
nodes = list(self.tree1.iter_subtrees_topdown())
self.assertEqual(nodes, expected)

def test_visitor(self):
class Visitor1(Visitor):
def __init__(self):
self.nodes=[]

def __default__(self,tree):
self.nodes.append(tree)
class Visitor1_Recursive(Visitor_Recursive):
def __init__(self):
self.nodes=[]

def __default__(self,tree):
self.nodes.append(tree)

visitor1=Visitor1()
visitor1_recursive=Visitor1_Recursive()

expected_top_down = [Tree('a', [Tree('b', 'x'), Tree('c', 'y'), Tree('d', 'z')]),
Tree('b', 'x'), Tree('c', 'y'), Tree('d', 'z')]
expected_botton_up= [Tree('b', 'x'), Tree('c', 'y'), Tree('d', 'z'),
Tree('a', [Tree('b', 'x'), Tree('c', 'y'), Tree('d', 'z')])]

visitor1.visit(self.tree1)
self.assertEqual(visitor1.nodes,expected_botton_up)

visitor1_recursive.visit(self.tree1)
self.assertEqual(visitor1_recursive.nodes,expected_botton_up)

visitor1.nodes=[]
visitor1_recursive.nodes=[]

visitor1.visit_topdown(self.tree1)
self.assertEqual(visitor1.nodes,expected_top_down)

visitor1_recursive.visit_topdown(self.tree1)
self.assertEqual(visitor1_recursive.nodes,expected_top_down)

def test_interp(self):
t = Tree('a', [Tree('b', []), Tree('c', []), 'd'])



Loading…
Cancel
Save