Merge branch 'master' into true_lalr3

5 years ago · 7c5917ba19
--- a/docs/classes.md
+++ b/docs/classes.md
@@ -1,15 +1,13 @@
 # Classes - Reference
 # Classes Reference

 This page details the important classes in Lark.

 ----

 ## Lark
 ## lark.Lark

 The Lark class is the main interface for the library. It's mostly a thin wrapper for the many different parsers, and for the tree constructor.

 ### Methods

 #### \_\_init\_\_(self, grammar, **options)

 The Lark class accepts a grammar string or file object, and keyword options:
@@ -50,14 +48,10 @@ If a transformer is supplied to `__init__`, returns whatever is the result of th

 The main tree class

 ### Properties

 * `data` - The name of the rule or alias
 * `children` - List of matched sub-rules and terminals
 * `meta` - Line & Column numbers, if using `propagate_positions`

 ### Methods

 #### \_\_init\_\_(self, data, children)

 Creates a new tree, and stores "data" and "children" in attributes of the same name.
@@ -92,102 +86,6 @@ Trees can be hashed and compared.

 ----

 ## Transformers & Visitors

 Transformers & Visitors provide a convenient interface to process the parse-trees that Lark returns.

 They are used by inheriting from the correct class (visitor or transformer), and implementing methods corresponding to the rule you wish to process. Each methods accepts the children as an argument. That can be modified using the `v-args` decorator, which allows to inline the arguments (akin to `*args`), or add the tree `meta` property as an argument.

 See: https://github.com/lark-parser/lark/blob/master/lark/visitors.py

 ### Visitors

 Visitors visit each node of the tree, and run the appropriate method on it according to the node's data.

 They work bottom-up, starting with the leaves and ending at the root of the tree.

 **Example**
 ```python
 class IncreaseAllNumbers(Visitor):
  def number(self, tree):
    assert tree.data == "number"
    tree.children[0] += 1

 IncreaseAllNumbers().visit(parse_tree)
 ```

 There are two classes that implement the visitor interface:

 * Visitor - Visit every node (without recursion)

 * Visitor_Recursive - Visit every node using recursion. Slightly faster.

 ### Transformers

 Transformers visit each node of the tree, and run the appropriate method on it according to the node's data.

 They work bottom-up (or: depth-first), starting with the leaves and ending at the root of the tree.

 Transformers can be used to implement map & reduce patterns.

 Because nodes are reduced from leaf to root, at any point the callbacks may assume the children have already been transformed (if applicable).

 Transformers can be chained into a new transformer by using multiplication.

 **Example:**
 ```python
 from lark import Tree, Transformer

 class EvalExpressions(Transformer):
    def expr(self, args):
            return eval(args[0])

 t = Tree('a', [Tree('expr', ['1+2'])])
 print(EvalExpressions().transform( t ))

 # Prints: Tree(a, [3])
 ```


 Here are the classes that implement the transformer interface:

 - Transformer - Recursively transforms the tree. This is the one you probably want.
 - Transformer_InPlace - Non-recursive. Changes the tree in-place instead of returning new instances
 - Transformer_InPlaceRecursive - Recursive. Changes the tree in-place instead of returning new instances

 ### v_args

 `v_args` is a decorator.

 By default, callback methods of transformers/visitors accept one argument: a list of the node's children. `v_args` can modify this behavior.

 When used on a transformer/visitor class definition, it applies to all the callback methods inside it.

 `v_args` accepts one of three flags:

 - `inline` - Children are provided as `*args` instead of a list argument (not recommended for very long lists).
 - `meta` - Provides two arguments: `children` and `meta` (instead of just the first)
 - `tree` - Provides the entire tree as the argument, instead of the children.

 Examples:

 ```python
@v_args(inline=True)
 class SolveArith(Transformer):
    def add(self, left, right):
        return left + right


 class ReverseNotation(Transformer_InPlace):
    @v_args(tree=True):
    def tree_node(self, tree):
        tree.children = tree.children[::-1]
 ```

 ### Discard

 When raising the `Discard` exception in a transformer callback, that node is discarded and won't appear in the parent.

 ## Token

 When using a lexer, the resulting tokens in the trees will be of the Token class, which inherits from Python's string. So, normal string comparisons and operations will work as expected. Tokens also have other useful attributes:
@@ -199,17 +97,25 @@ When using a lexer, the resulting tokens in the trees will be of the Token class
 * `end_line` - The line where the token ends
 * `end_column` - The next column after the end of the token. For example, if the token is a single character with a `column` value of 4, `end_column` will be 5.

 ## Transformer
 ## Visitor
 ## Interpreter

 See the [visitors page](visitors.md)


 ## UnexpectedInput

 ## UnexpectedToken

 ## UnexpectedException

 - `UnexpectedInput`
    - `UnexpectedToken` - The parser recieved an unexpected token
    - `UnexpectedCharacters` - The lexer encountered an unexpected string

 After catching one of these exceptions, you may call the following helper methods to create a nicer error message:

 ### Methods

 #### get_context(text, span)

 Returns a pretty string pinpointing the error in the text, with `span` amount of context characters around it.
--- a/docs/grammar.md
+++ b/docs/grammar.md
@@ -1,5 +1,13 @@
 # Grammar Reference

 Table of contents:

 1. [Definitions](#defs)
 1. [Terminals](#terms)
 1. [Rules](#rules)
 1. [Directives](#dirs)

 <a name="defs"></a>
 ## Definitions

 **A grammar** is a list of rules and terminals, that together define a language.
@@ -25,6 +33,7 @@ Lark begins the parse with the rule 'start', unless specified otherwise in the o
 Names of rules are always in lowercase, while names of terminals are always in uppercase. This distinction has practical effects, for the shape of the generated parse-tree, and the automatic construction of the lexer (aka tokenizer, or scanner).


 <a name="terms"></a>
 ## Terminals

 Terminals are used to match text into symbols. They can be defined as a combination of literals and other terminals.
@@ -70,6 +79,53 @@ WHITESPACE: (" " | /\t/ )+
 SQL_SELECT: "select"i
 ```

 ### Regular expressions & Ambiguity

 Each terminal is eventually compiled to a regular expression. All the operators and references inside it are mapped to their respective expressions.

 For example, in the following grammar, `A1` and `A2`, are equivalent:
 ```perl
 A1: "a" | "b"
 A2: /a|b/
 ```

 This means that inside terminals, Lark cannot detect or resolve ambiguity, even when using Earley.

 For example, for this grammar:
 ```perl
 start           : (A | B)+
 A               : "a" | "ab"
 B               : "b"
 ```
 We get this behavior:

 ```bash
 >>> p.parse("ab")
 Tree(start, [Token(A, 'a'), Token(B, 'b')])
 ```

 This is happening because Python's regex engine always returns the first matching option.

 If you find yourself in this situation, the recommended solution is to use rules instead.

 Example:

 ```python
 >>> p = Lark("""start: (a | b)+
 ...             !a: "a" | "ab"
 ...             !b: "b"
 ...             """, ambiguity="explicit")
 >>> print(p.parse("ab").pretty())
 _ambig
  start
    a   ab
  start
    a   a
    b   b
 ```


 <a name="rules"></a>
 ## Rules

 **Syntax:**
@@ -114,6 +170,7 @@ Rules can be assigned priority only when using Earley (future versions may suppo

 Priority can be either positive or negative. In not specified for a terminal, it's assumed to be 1 (i.e. the default).

 <a name="dirs"></a>
 ## Directives

 ### %ignore
@@ -122,7 +179,7 @@ All occurrences of the terminal will be ignored, and won't be part of the parse.

 Using the `%ignore` directive results in a cleaner grammar.

 It's especially important for the LALR(1) algorithm, because adding whitespace (or comments, or other extranous elements) explicitly in the grammar, harms its predictive abilities, which are based on a lookahead of 1.
 It's especially important for the LALR(1) algorithm, because adding whitespace (or comments, or other extraneous elements) explicitly in the grammar, harms its predictive abilities, which are based on a lookahead of 1.

 **Syntax:**
 ```html
--- a/docs/how_to_develop.md
+++ b/docs/how_to_develop.md
@@ -7,7 +7,7 @@ There are many ways you can help the project:
 * Write new grammars for Lark's library
 * Write a blog post introducing Lark to your audience
 * Port Lark to another language
 * Help me with code developemnt
 * Help me with code development

 If you're interested in taking one of these on, let me know and I will provide more details and assist you in the process.

@@ -60,4 +60,4 @@ Another way to run the tests is using setup.py:

 ```bash
 python setup.py test 
 ```
 ```
--- a/docs/index.md
+++ b/docs/index.md
@@ -35,8 +35,8 @@ $ pip install lark-parser
 * [Examples](https://github.com/lark-parser/lark/tree/master/examples)
 * Tutorials
    * [How to write a DSL](http://blog.erezsh.com/how-to-write-a-dsl-in-python-with-lark/) - Implements a toy LOGO-like language with an interpreter
    * [How to write a JSON parser](json_tutorial.md)
    * External
    * [How to write a JSON parser](json_tutorial.md) - Teaches you how to use Lark
    * Unofficial
        * [Program Synthesis is Possible](https://www.cs.cornell.edu/~asampson/blog/minisynth.html) - Creates a DSL for Z3
 * Guides
    * [How to use Lark](how_to_use.md)
@@ -44,6 +44,7 @@ $ pip install lark-parser
 * Reference
    * [Grammar](grammar.md)
    * [Tree Construction](tree_construction.md)
    * [Visitors & Transformers](visitors.md)
    * [Classes](classes.md)
    * [Cheatsheet (PDF)](lark_cheatsheet.pdf)
 * Discussion
--- a/docs/json_tutorial.md
+++ b/docs/json_tutorial.md
@@ -230,7 +230,8 @@ from lark import Transformer
 class MyTransformer(Transformer):
    def list(self, items):
        return list(items)
    def pair(self, (k,v)):
    def pair(self, key_value):
        k, v = key_value
        return k, v
    def dict(self, items):
        return dict(items)
@@ -251,9 +252,11 @@ Also, our definitions of list and dict are a bit verbose. We can do better:
 from lark import Transformer

 class TreeToJson(Transformer):
    def string(self, (s,)):
    def string(self, s):
        (s,) = s
        return s[1:-1]
    def number(self, (n,)):
    def number(self, n):
        (n,) = n
        return float(n)

    list = list
@@ -315,9 +318,11 @@ json_grammar = r"""
    """

 class TreeToJson(Transformer):
    def string(self, (s,)):
    def string(self, s):
        (s,) = s
        return s[1:-1]
    def number(self, (n,)):
    def number(self, n):
        (n,) = n
        return float(n)

    list = list
--- a/docs/parsers.md
+++ b/docs/parsers.md
@@ -5,9 +5,9 @@ Lark implements the following parsing algorithms: Earley, LALR(1), and CYK

 An [Earley Parser](https://www.wikiwand.com/en/Earley_parser) is a chart parser capable of parsing any context-free grammar at O(n^3), and O(n^2) when the grammar is unambiguous. It can parse most LR grammars at O(n). Most programming languages are LR, and can be parsed at a linear time.

 Lark's Earley implementation runs on top of a skipping chart parser, which allows it to use regular expressions, instead of matching characters one-by-one. This is a huge improvement to Earley that is unique to Lark. This feature is used by default, but can also be requested explicitely using `lexer='dynamic'`.
 Lark's Earley implementation runs on top of a skipping chart parser, which allows it to use regular expressions, instead of matching characters one-by-one. This is a huge improvement to Earley that is unique to Lark. This feature is used by default, but can also be requested explicitly using `lexer='dynamic'`.

 It's possible to bypass the dynamic lexing, and use the regular Earley parser with a traditional lexer, that tokenizes as an independant first step. Doing so will provide a speed benefit, but will tokenize without using Earley's ambiguity-resolution ability. So choose this only if you know why! Activate with `lexer='standard'`
 It's possible to bypass the dynamic lexing, and use the regular Earley parser with a traditional lexer, that tokenizes as an independent first step. Doing so will provide a speed benefit, but will tokenize without using Earley's ambiguity-resolution ability. So choose this only if you know why! Activate with `lexer='standard'`

 **SPPF & Ambiguity resolution**

@@ -21,7 +21,7 @@ Lark provides the following options to combat ambiguity:

 1) Lark will choose the best derivation for you (default). Users can choose between different disambiguation strategies, and can prioritize (or demote) individual rules over others, using the rule-priority syntax.

 2) Users may choose to recieve the set of all possible parse-trees (using ambiguity='explicit'), and choose the best derivation themselves. While simple and flexible, it comes at the cost of space and performance, and so it isn't recommended for highly ambiguous grammars, or very long inputs.
 2) Users may choose to receive the set of all possible parse-trees (using ambiguity='explicit'), and choose the best derivation themselves. While simple and flexible, it comes at the cost of space and performance, and so it isn't recommended for highly ambiguous grammars, or very long inputs.

 3) As an advanced feature, users may use specialized visitors to iterate the SPPF themselves. Future versions of Lark intend to improve and simplify this interface.

--- a/docs/visitors.md
+++ b/docs/visitors.md
@@ -0,0 +1,117 @@
 ## Transformers & Visitors

 Transformers & Visitors provide a convenient interface to process the parse-trees that Lark returns.

 They are used by inheriting from the correct class (visitor or transformer), and implementing methods corresponding to the rule you wish to process. Each method accepts the children as an argument. That can be modified using the `v_args` decorator, which allows to inline the arguments (akin to `*args`), or add the tree `meta` property as an argument.

 See: <a href="https://github.com/lark-parser/lark/blob/master/lark/visitors.py">visitors.py</a>

 ### Visitors

 Visitors visit each node of the tree, and run the appropriate method on it according to the node's data.

 They work bottom-up, starting with the leaves and ending at the root of the tree.

 **Example**
 ```python
 class IncreaseAllNumbers(Visitor):
  def number(self, tree):
    assert tree.data == "number"
    tree.children[0] += 1

 IncreaseAllNumbers().visit(parse_tree)
 ```

 There are two classes that implement the visitor interface:

 * Visitor - Visit every node (without recursion)

 * Visitor_Recursive - Visit every node using recursion. Slightly faster.

 ### Transformers

 Transformers visit each node of the tree, and run the appropriate method on it according to the node's data.

 They work bottom-up (or: depth-first), starting with the leaves and ending at the root of the tree.

 Transformers can be used to implement map & reduce patterns.

 Because nodes are reduced from leaf to root, at any point the callbacks may assume the children have already been transformed (if applicable).

 Transformers can be chained into a new transformer by using multiplication.

 `Transformer` can do anything `Visitor` can do, but because it reconstructs the tree, it is slightly less efficient.


 **Example:**
 ```python
 from lark import Tree, Transformer

 class EvalExpressions(Transformer):
    def expr(self, args):
            return eval(args[0])

 t = Tree('a', [Tree('expr', ['1+2'])])
 print(EvalExpressions().transform( t ))

 # Prints: Tree(a, [3])
 ```

 All these classes implement the transformer interface:

 - Transformer - Recursively transforms the tree. This is the one you probably want.
 - Transformer_InPlace - Non-recursive. Changes the tree in-place instead of returning new instances
 - Transformer_InPlaceRecursive - Recursive. Changes the tree in-place instead of returning new instances

 ### visit_tokens

 By default, transformers only visit rules. `visit_tokens=True` will tell Transformer to visit tokens as well. This is a slightly slower alternative to `lexer_callbacks`, but it's easier to maintain and works for all algorithms (even when there isn't a lexer).

 Example:

 ```python
 class T(Transformer):
    INT = int
    NUMBER = float
    def NAME(self, name):
        return lookup_dict.get(name, name)


 T(visit_tokens=True).transform(tree)
 ```


 ### v_args

 `v_args` is a decorator.

 By default, callback methods of transformers/visitors accept one argument: a list of the node's children. `v_args` can modify this behavior.

 When used on a transformer/visitor class definition, it applies to all the callback methods inside it.

 `v_args` accepts one of three flags:

 - `inline` - Children are provided as `*args` instead of a list argument (not recommended for very long lists).
 - `meta` - Provides two arguments: `children` and `meta` (instead of just the first)
 - `tree` - Provides the entire tree as the argument, instead of the children.

 Examples:

 ```python
@v_args(inline=True)
 class SolveArith(Transformer):
    def add(self, left, right):
        return left + right


 class ReverseNotation(Transformer_InPlace):
    @v_args(tree=True):
    def tree_node(self, tree):
        tree.children = tree.children[::-1]
 ```

 ### Discard

 When raising the `Discard` exception in a transformer callback, that node is discarded and won't appear in the parent.


--- a/lark/init.py
+++ b/lark/init.py
@@ -5,4 +5,4 @@ from .exceptions import ParseError, LexError, GrammarError, UnexpectedToken, Une
 from .lexer import Token
 from .lark import Lark

 __version__ = "0.7.4"
 __version__ = "0.8.0rc1"
--- a/lark/exceptions.py
+++ b/lark/exceptions.py
@@ -13,6 +13,14 @@ class ParseError(LarkError):
 class LexError(LarkError):
    pass

 class UnexpectedEOF(ParseError):
    def __init__(self, expected):
        self.expected = expected

        message = ("Unexpected end-of-input. Expected one of: \n\t* %s\n" % '\n\t* '.join(x.name for x in self.expected))
        super(UnexpectedEOF, self).__init__(message)


 class UnexpectedInput(LarkError):
    pos_in_stream = None

--- a/lark/lark.py
+++ b/lark/lark.py
@@ -69,6 +69,7 @@ class LarkOptions(Serialize):
        'propagate_positions': False,
        'lexer_callbacks': {},
        'maybe_placeholders': False,
        'edit_terminals': None,
    }

    def __init__(self, options_dict):
@@ -85,7 +86,7 @@ class LarkOptions(Serialize):

            options[name] = value

        if isinstance(options['start'], str):
        if isinstance(options['start'], STRING_TYPE):
            options['start'] = [options['start']]

        self.__dict__['options'] = options
@@ -205,6 +206,10 @@ class Lark(Serialize):
        # Compile the EBNF grammar into BNF
        self.terminals, self.rules, self.ignore_tokens = self.grammar.compile(self.options.start)

        if self.options.edit_terminals:
            for t in self.terminals:
                self.options.edit_terminals(t)

        self._terminals_dict = {t.name:t for t in self.terminals}

        # If the user asked to invert the priorities, negate them all here.
--- a/lark/lexer.py
+++ b/lark/lexer.py
@@ -3,7 +3,7 @@
 import re

 from .utils import Str, classify, get_regexp_width, Py36, Serialize
 from .exceptions import UnexpectedCharacters, LexError
 from .exceptions import UnexpectedCharacters, LexError, UnexpectedToken

 ###{standalone

@@ -43,7 +43,7 @@ class PatternStr(Pattern):
    __serialize_fields__ = 'value', 'flags'

    type = "str"
    

    def to_regexp(self):
        return self._get_flags(re.escape(self.value))

@@ -166,36 +166,33 @@ class _Lex:

        while line_ctr.char_pos < len(stream):
            lexer = self.lexer
            for mre, type_from_index in lexer.mres:
                m = mre.match(stream, line_ctr.char_pos)
                if not m:
                    continue

                t = None
                value = m.group(0)
                type_ = type_from_index[m.lastindex]
                if type_ not in ignore_types:
                    t = Token(type_, value, line_ctr.char_pos, line_ctr.line, line_ctr.column)
                    if t.type in lexer.callback:
                        t = lexer.callback[t.type](t)
                        if not isinstance(t, Token):
                            raise ValueError("Callbacks must return a token (returned %r)" % t)
                    last_token = t
                    yield t
                else:
                    if type_ in lexer.callback:
                        t = Token(type_, value, line_ctr.char_pos, line_ctr.line, line_ctr.column)
                        lexer.callback[type_](t)

                line_ctr.feed(value, type_ in newline_types)
                if t:
                    t.end_line = line_ctr.line
                    t.end_column = line_ctr.column
            res = lexer.match(stream, line_ctr.char_pos)
            if not res:
                allowed = {v for m, tfi in lexer.mres for v in tfi.values()} - ignore_types
                if not allowed:
                    allowed = {"<END-OF-FILE>"}
                raise UnexpectedCharacters(stream, line_ctr.char_pos, line_ctr.line, line_ctr.column, allowed=allowed, state=self.state, token_history=last_token and [last_token])

                break
            value, type_ = res

            t = None
            if type_ not in ignore_types:
                t = Token(type_, value, line_ctr.char_pos, line_ctr.line, line_ctr.column)
                if t.type in lexer.callback:
                    t = lexer.callback[t.type](t)
                    if not isinstance(t, Token):
                        raise ValueError("Callbacks must return a token (returned %r)" % t)
                last_token = t
                yield t
            else:
                allowed = {v for m, tfi in lexer.mres for v in tfi.values()}
                raise UnexpectedCharacters(stream, line_ctr.char_pos, line_ctr.line, line_ctr.column, allowed=allowed, state=self.state, token_history=last_token and [last_token])
                if type_ in lexer.callback:
                    t = Token(type_, value, line_ctr.char_pos, line_ctr.line, line_ctr.column)
                    lexer.callback[type_](t)

            line_ctr.feed(value, type_ in newline_types)
            if t:
                t.end_line = line_ctr.line
                t.end_column = line_ctr.column


 class UnlessCallback:
@@ -330,6 +327,11 @@ class TraditionalLexer(Lexer):

        self.mres = build_mres(terminals)

    def match(self, stream, pos):
        for mre, type_from_index in self.mres:
            m = mre.match(stream, pos)
            if m:
                return m.group(0), type_from_index[m.lastindex]

    def lex(self, stream):
        return _Lex(self).lex(stream, self.newline_types, self.ignore_types)
@@ -367,9 +369,21 @@ class ContextualLexer(Lexer):

    def lex(self, stream):
        l = _Lex(self.lexers[self.parser_state], self.parser_state)
        for x in l.lex(stream, self.root_lexer.newline_types, self.root_lexer.ignore_types):
            yield x
            l.lexer = self.lexers[self.parser_state]
            l.state = self.parser_state
        try:
            for x in l.lex(stream, self.root_lexer.newline_types, self.root_lexer.ignore_types):
                yield x
                l.lexer = self.lexers[self.parser_state]
                l.state = self.parser_state
        except UnexpectedCharacters as e:
            # In the contextual lexer, UnexpectedCharacters can mean that the terminal is defined,
            # but not in the current context.
            # This tests the input against the global context, to provide a nicer error.
            root_match = self.root_lexer.match(stream, e.pos_in_stream)
            if not root_match:
                raise

            value, type_ = root_match
            t = Token(type_, value, e.pos_in_stream, e.line, e.column)
            raise UnexpectedToken(t, e.allowed, state=e.state)

 ###}
--- a/lark/load_grammar.py
+++ b/lark/load_grammar.py
@@ -479,7 +479,7 @@ class Grammar:
        # ===================

        # Convert terminal-trees to strings/regexps
        transformer = PrepareLiterals() * TerminalTreeToPattern()

        for name, (term_tree, priority) in term_defs:
            if term_tree is None:  # Terminal added through %declare
                continue
@@ -487,7 +487,8 @@ class Grammar:
            if len(expansions) == 1 and not expansions[0].children:
                raise GrammarError("Terminals cannot be empty (%s)" % name)

        terminals = [TerminalDef(name, transformer.transform(term_tree), priority)
        transformer = PrepareLiterals() * TerminalTreeToPattern()
        terminals = [TerminalDef(name, transformer.transform( term_tree ), priority)
                  for name, (term_tree, priority) in term_defs if term_tree]

        # =================
@@ -638,11 +639,10 @@ def import_from_grammar_into_namespace(grammar, namespace, aliases):


 def resolve_term_references(term_defs):
    # TODO Cycles detection
    # TODO Solve with transitive closure (maybe)

    token_dict = {k:t for k, (t,_p) in term_defs}
    assert len(token_dict) == len(term_defs), "Same name defined twice?"
    term_dict = {k:t for k, (t,_p) in term_defs}
    assert len(term_dict) == len(term_defs), "Same name defined twice?"

    while True:
        changed = False
@@ -655,11 +655,21 @@ def resolve_term_references(term_defs):
                    if item.type == 'RULE':
                        raise GrammarError("Rules aren't allowed inside terminals (%s in %s)" % (item, name))
                    if item.type == 'TERMINAL':
                        exp.children[0] = token_dict[item]
                        term_value = term_dict[item]
                        assert term_value is not None
                        exp.children[0] = term_value
                        changed = True
        if not changed:
            break

    for name, term in term_dict.items():
        if term:    # Not just declared
            for child in term.children:
                ids = [id(x) for x in child.iter_subtrees()]
                if id(term) in ids:
                    raise GrammarError("Recursion in terminal '%s' (recursion is only allowed in rules, not terminals)" % name)


 def options_from_rule(name, *x):
    if len(x) > 1:
        priority, expansions = x
--- a/lark/parsers/earley.py
+++ b/lark/parsers/earley.py
@@ -10,10 +10,11 @@ is better documented here:
    http://www.bramvandersanden.com/post/2014/06/shared-packed-parse-forest/
 """

 import logging
 from collections import deque

 from ..visitors import Transformer_InPlace, v_args
 from ..exceptions import ParseError, UnexpectedToken
 from ..exceptions import UnexpectedEOF, UnexpectedToken
 from .grammar_analysis import GrammarAnalyzer
 from ..grammar import NonTerminal
 from .earley_common import Item, TransitiveItem
@@ -45,12 +46,8 @@ class Parser:
            #  skip the extra tree walk. We'll also skip this if the user just didn't specify priorities
            #  on any rules.
            if self.forest_sum_visitor is None and rule.options and rule.options.priority is not None:
                self.forest_sum_visitor = ForestSumVisitor()
                self.forest_sum_visitor = ForestSumVisitor

        if resolve_ambiguity:
            self.forest_tree_visitor = ForestToTreeVisitor(self.callbacks, self.forest_sum_visitor)
        else:
            self.forest_tree_visitor = ForestToAmbiguousTreeVisitor(self.callbacks, self.forest_sum_visitor)
        self.term_matcher = term_matcher


@@ -273,6 +270,7 @@ class Parser:

        ## Column is now the final column in the parse.
        assert i == len(columns)-1
        return to_scan

    def parse(self, stream, start):
        assert start, start
@@ -291,7 +289,7 @@ class Parser:
            else:
                columns[0].add(item)

        self._parse(stream, columns, to_scan, start_symbol)
        to_scan = self._parse(stream, columns, to_scan, start_symbol)

        # If the parse was successful, the start
        # symbol should have been completed in the last step of the Earley cycle, and will be in
@@ -299,18 +297,25 @@ class Parser:
        solutions = [n.node for n in columns[-1] if n.is_complete and n.node is not None and n.s == start_symbol and n.start == 0]
        if self.debug:
            from .earley_forest import ForestToPyDotVisitor
            debug_walker = ForestToPyDotVisitor()
            debug_walker.visit(solutions[0], "sppf.png")
            try:
                debug_walker = ForestToPyDotVisitor()
            except ImportError:
                logging.warning("Cannot find dependency 'pydot', will not generate sppf debug image")
            else:
                debug_walker.visit(solutions[0], "sppf.png")


        if not solutions:
            expected_tokens = [t.expect for t in to_scan]
            # raise ParseError('Incomplete parse: Could not find a solution to input')
            raise ParseError('Unexpected end of input! Expecting a terminal of: %s' % expected_tokens)
            raise UnexpectedEOF(expected_tokens)
        elif len(solutions) > 1:
            assert False, 'Earley should not generate multiple start symbol items!'

        # Perform our SPPF -> AST conversion using the right ForestVisitor.
        return self.forest_tree_visitor.visit(solutions[0])
        forest_tree_visitor_cls = ForestToTreeVisitor if self.resolve_ambiguity else ForestToAmbiguousTreeVisitor
        forest_tree_visitor = forest_tree_visitor_cls(self.callbacks, self.forest_sum_visitor and self.forest_sum_visitor())

        return forest_tree_visitor.visit(solutions[0])


 class ApplyCallbacks(Transformer_InPlace):
--- a/lark/parsers/xearley.py
+++ b/lark/parsers/xearley.py
@@ -146,4 +146,5 @@ class Parser(BaseParser):
        self.predict_and_complete(i, to_scan, columns, transitives)

        ## Column is now the final column in the parse.
        assert i == len(columns)-1
        assert i == len(columns)-1
        return to_scan
--- a/lark/visitors.py
+++ b/lark/visitors.py
@@ -3,6 +3,7 @@ from functools import wraps
 from .utils import smart_decorator
 from .tree import Tree
 from .exceptions import VisitError, GrammarError
 from .lexer import Token

 ###{standalone
 from inspect import getmembers, getmro
@@ -21,6 +22,10 @@ class Transformer:
    Can be used to implement map or reduce.
    """

    __visit_tokens__ = False   # For backwards compatibility
    def __init__(self,  visit_tokens=False):
        self.__visit_tokens__ = visit_tokens

    def _call_userfunc(self, tree, new_children=None):
        # Assumes tree is already transformed
        children = new_children if new_children is not None else tree.children
@@ -45,10 +50,29 @@ class Transformer:
            except Exception as e:
                raise VisitError(tree, e)

    def _call_userfunc_token(self, token):
        try:
            f = getattr(self, token.type)
        except AttributeError:
            return self.__default_token__(token)
        else:
            try:
                return f(token)
            except (GrammarError, Discard):
                raise
            except Exception as e:
                raise VisitError(token, e)


    def _transform_children(self, children):
        for c in children:
            try:
                yield self._transform_tree(c) if isinstance(c, Tree) else c
                if isinstance(c, Tree):
                    yield self._transform_tree(c)
                elif self.__visit_tokens__ and isinstance(c, Token):
                    yield self._call_userfunc_token(c)
                else:
                    yield c
            except Discard:
                pass

@@ -66,6 +90,11 @@ class Transformer:
        "Default operation on tree (for override)"
        return Tree(data, children, meta)

    def __default_token__(self, token):
        "Default operation on token (for override)"
        return token


    @classmethod
    def _apply_decorator(cls, decorator, **kwargs):
        mro = getmro(cls)
@@ -157,6 +186,11 @@ class Visitor(VisitorBase):
            self._call_userfunc(subtree)
        return tree

    def visit_topdown(self,tree):
        for subtree in tree.iter_subtrees_topdown():
            self._call_userfunc(subtree)
        return tree        

 class Visitor_Recursive(VisitorBase):
    """Bottom-up visitor, recursive

@@ -169,8 +203,16 @@ class Visitor_Recursive(VisitorBase):
            if isinstance(child, Tree):
                self.visit(child)

        f = getattr(self, tree.data, self.__default__)
        f(tree)
        self._call_userfunc(tree)
        return tree

    def visit_topdown(self,tree):
        self._call_userfunc(tree)

        for child in tree.children:
            if isinstance(child, Tree):
                self.visit_topdown(child)
        
        return tree


--- a/mkdocs.yml
+++ b/mkdocs.yml
@@ -9,5 +9,6 @@ pages:
    - How To Develop (Guide): how_to_develop.md
    - Grammar Reference: grammar.md
    - Tree Construction Reference: tree_construction.md
    - Visitors and Transformers: visitors.md
    - Classes Reference: classes.md
    - Recipes: recipes.md
--- a/tests/main.py
+++ b/tests/main.py
@@ -10,7 +10,7 @@ from .test_reconstructor import TestReconstructor
 try:
    from .test_nearley.test_nearley import TestNearley
 except ImportError:
    pass
    logging.warn("Warning: Skipping tests for Nearley (js2py required)")

 # from .test_selectors import TestSelectors
 # from .test_grammars import TestPythonG, TestConfigG
--- a/tests/test_nearley/test_nearley.py
+++ b/tests/test_nearley/test_nearley.py
@@ -15,9 +15,12 @@ NEARLEY_PATH = os.path.join(TEST_PATH, 'nearley')
 BUILTIN_PATH = os.path.join(NEARLEY_PATH, 'builtin')

 if not os.path.exists(NEARLEY_PATH):
    print("Skipping Nearley tests!")
    logging.warn("Nearley not installed. Skipping Nearley tests!")
    raise ImportError("Skipping Nearley tests!")

 import js2py    # Ensures that js2py exists, to avoid failing tests


 class TestNearley(unittest.TestCase):
    def test_css(self):
        fn = os.path.join(NEARLEY_PATH, 'examples/csscolor.ne')
--- a/tests/test_parser.py
+++ b/tests/test_parser.py
@@ -94,6 +94,24 @@ class TestParsers(unittest.TestCase):
        r = g.parse('xx')
        self.assertEqual( r.children[0].data, "c" )

    def test_visit_tokens(self):
        class T(Transformer):
            def a(self, children):
                return children[0] + "!"
            def A(self, tok):
                return tok.upper()

        # Test regular
        g = Lark("""start: a
                    a : A
                    A: "x"
                 """, parser='lalr')
        r = T().transform(g.parse("x"))
        self.assertEqual( r.children, ["x!"] )
        r = T(True).transform(g.parse("x"))
        self.assertEqual( r.children, ["X!"] )


    def test_embedded_transformer(self):
        class T(Transformer):
            def a(self, children):
--- a/tests/test_trees.py
+++ b/tests/test_trees.py
@@ -7,7 +7,7 @@ import pickle
 import functools

 from lark.tree import Tree
 from lark.visitors import Transformer, Interpreter, visit_children_decor, v_args, Discard
 from lark.visitors import Visitor, Visitor_Recursive, Transformer, Interpreter, visit_children_decor, v_args, Discard


 class TestTrees(TestCase):
@@ -34,6 +34,43 @@ class TestTrees(TestCase):
        nodes = list(self.tree1.iter_subtrees_topdown())
        self.assertEqual(nodes, expected)

    def test_visitor(self):
        class Visitor1(Visitor):
            def __init__(self):
                self.nodes=[]

            def __default__(self,tree):
                self.nodes.append(tree)
        class Visitor1_Recursive(Visitor_Recursive):
            def __init__(self):
                self.nodes=[]

            def __default__(self,tree):
                self.nodes.append(tree)

        visitor1=Visitor1()
        visitor1_recursive=Visitor1_Recursive()

        expected_top_down = [Tree('a', [Tree('b', 'x'), Tree('c', 'y'), Tree('d', 'z')]),
                    Tree('b', 'x'), Tree('c', 'y'), Tree('d', 'z')]
        expected_botton_up= [Tree('b', 'x'), Tree('c', 'y'), Tree('d', 'z'),
                    Tree('a', [Tree('b', 'x'), Tree('c', 'y'), Tree('d', 'z')])]

        visitor1.visit(self.tree1)
        self.assertEqual(visitor1.nodes,expected_botton_up)

        visitor1_recursive.visit(self.tree1)
        self.assertEqual(visitor1_recursive.nodes,expected_botton_up)

        visitor1.nodes=[]
        visitor1_recursive.nodes=[]

        visitor1.visit_topdown(self.tree1)
        self.assertEqual(visitor1.nodes,expected_top_down)

        visitor1_recursive.visit_topdown(self.tree1)
        self.assertEqual(visitor1_recursive.nodes,expected_top_down)

    def test_interp(self):
        t = Tree('a', [Tree('b', []), Tree('c', []), 'd'])