Merge branch 'evalable_repr' into master

5 years ago · 9cebf44091
--- a/.github/workflows/tests.yml
+++ b/.github/workflows/tests.yml
@@ -22,7 +22,7 @@ jobs:
      - name: Install dependencies
        run: |
          python -m pip install --upgrade pip
          pip install -r nearley-requirements.txt
          pip install -r test-requirements.txt
      - name: Run tests
        run: |
          python -m tests
--- a/.gitignore
+++ b/.gitignore
@@ -10,3 +10,5 @@ tags
 .mypy_cache
 /dist
 /build
 docs/_build
 docs/examples
--- a/README.md
+++ b/README.md
@@ -1,18 +1,18 @@
 # Lark - a modern parsing library for Python
 # Lark - a parsing toolkit for Python

 Lark is a parser built with a focus on ergonomics, performance and resilience.
 Lark is a parsing toolkit for Python, built with a focus on ergonomics, performance and modularity.

 Lark can parse all context-free languages. That means it is capable of parsing almost any programming language out there, and to some degree most natural languages too.
 Lark can parse all context-free languages. To put it simply, it means that it is capable of parsing almost any programming language out there, and to some degree most natural languages too.

 **Who is it for?**

   - **Beginners**: Lark is very friendly for experimentation. It can parse any grammar you throw at it, no matter how complicated or ambiguous, and do so efficiently. It also constructs an annotated parse-tree for you, using only the grammar, and it gives you convienient and flexible tools to process that parse-tree.
   - **Beginners**: Lark is very friendly for experimentation. It can parse any grammar you throw at it, no matter how complicated or ambiguous, and do so efficiently. It also constructs an annotated parse-tree for you, using only the grammar and an input, and it gives you convienient and flexible tools to process that parse-tree.

   - **Experts**: Lark implements both Earley(SPPF) and LALR(1), and several different lexers, so you can trade-off power and speed, according to your requirements. It also provides a variety of sophisticated features and utilities.

 **What can it do?**

 - Parse all context-free grammars, and handle any ambiguity
 - Parse all context-free grammars, and handle any ambiguity gracefully
 - Build an annotated parse-tree automagically, no construction code required.
 - Provide first-rate performance in terms of both Big-O complexity and measured run-time (considering that this is Python ;)
 - Run on every Python interpreter (it's pure-python)
@@ -25,14 +25,15 @@ Most importantly, Lark will save you time and prevent you from getting parsing h
 ### Quick links

 - [Documentation @readthedocs](https://lark-parser.readthedocs.io/)
 - [Cheatsheet (PDF)](/docs/lark_cheatsheet.pdf)
 - [Cheatsheet (PDF)](/docs/_static/lark_cheatsheet.pdf)
 - [Online IDE (very basic)](https://lark-parser.github.io/lark/ide/app.html)
 - [Tutorial](/docs/json_tutorial.md) for writing a JSON parser.
 - Blog post: [How to write a DSL with Lark](http://blog.erezsh.com/how-to-write-a-dsl-in-python-with-lark/)
 - [Gitter chat](https://gitter.im/lark-parser/Lobby)

 ### Install Lark

    $ pip install lark-parser
    $ pip install lark-parser --upgrade

 Lark has no dependencies.

@@ -76,12 +77,11 @@ Notice punctuation doesn't appear in the resulting tree. It's automatically filt

 ### Fruit flies like bananas

 Lark is great at handling ambiguity. Let's parse the phrase "fruit flies like bananas":
 Lark is great at handling ambiguity. Here is the result of parsing the phrase "fruit flies like bananas":

 ![fruitflies.png](examples/fruitflies.png)

 See more [examples here](https://github.com/lark-parser/lark/tree/master/examples)

 [Read the code here](https://github.com/lark-parser/lark/tree/master/examples/fruitflies.py), and [more examples here](https://github.com/lark-parser/lark/tree/master/examples)


 ## List of main features
@@ -99,7 +99,7 @@ See more [examples here](https://github.com/lark-parser/lark/tree/master/example
 - **Python 2 & 3** compatible
 - Automatic line & column tracking
 - Standard library of terminals (strings, numbers, names, etc.)
 - Import grammars from Nearley.js
 - Import grammars from Nearley.js ([read more](/docs/nearley.md))
 - Extensive test suite [![codecov](https://codecov.io/gh/erezsh/lark/branch/master/graph/badge.svg)](https://codecov.io/gh/erezsh/lark)
 - MyPy support using type stubs
 - And much more!
@@ -113,9 +113,9 @@ See the full list of [features here](https://lark-parser.readthedocs.io/en/lates

 Lark is the fastest and lightest (lower is better)

 ![Run-time Comparison](docs/comparison_runtime.png)
 ![Run-time Comparison](docs/_static/comparison_runtime.png)

 ![Memory Usage Comparison](docs/comparison_memory.png)
 ![Memory Usage Comparison](docs/_static/comparison_memory.png)


 Check out the [JSON tutorial](/docs/json_tutorial.md#conclusion) for more details on how the comparison was made.
@@ -155,28 +155,10 @@ Check out the [JSON tutorial](/docs/json_tutorial.md#conclusion) for more detail
 - [miniwdl](https://github.com/chanzuckerberg/miniwdl) - A static analysis toolkit for the Workflow Description Language
 - [pytreeview](https://gitlab.com/parmenti/pytreeview) - a lightweight tree-based grammar explorer
 - [harmalysis](https://github.com/napulen/harmalysis) - A language for harmonic analysis and music theory
 - [gersemi](https://github.com/BlankSpruce/gersemi) - A CMake code formatter

 Using Lark? Send me a message and I'll add your project!

 ### How to use Nearley grammars in Lark

 Lark comes with a tool to convert grammars from [Nearley](https://github.com/Hardmath123/nearley), a popular Earley library for Javascript. It uses [Js2Py](https://github.com/PiotrDabkowski/Js2Py) to convert and run the Javascript postprocessing code segments.

 Here's an example:
 ```bash
 git clone https://github.com/Hardmath123/nearley
 python -m lark.tools.nearley nearley/examples/calculator/arithmetic.ne main nearley > ncalc.py
 ```

 You can use the output as a regular python module:

 ```python
 >>> import ncalc
 >>> ncalc.parse('sin(pi/4) ^ e')
 0.38981434460254655
 ```


 ## License

 Lark uses the [MIT license](LICENSE).
--- a/docs/Makefile
+++ b/docs/Makefile
@@ -0,0 +1,20 @@
 # Minimal makefile for Sphinx documentation
 #

 # You can set these variables from the command line.
 SPHINXOPTS    =
 SPHINXBUILD   = sphinx-build
 SPHINXPROJ    = Lark
 SOURCEDIR     = .
 BUILDDIR      = _build

 # Put it first so that "make" without argument is like "make help".
 help:
 	@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

 .PHONY: help Makefile

 # Catch-all target: route all unknown targets to Sphinx using the new
 # "make mode" option.  $(O) is meant as a shortcut for $(SPHINXOPTS).
 %: Makefile
 	@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
--- a/docs/_static/comparison_memory.png
+++ b/docs/_static/comparison_memory.png
--- a/docs/_static/comparison_runtime.png
+++ b/docs/_static/comparison_runtime.png
--- a/docs/_static/lark_cheatsheet.pdf
+++ b/docs/_static/lark_cheatsheet.pdf
--- a/docs/classes.md
+++ b/docs/classes.md
@@ -1,185 +0,0 @@
 # Classes Reference

 This page details the important classes in Lark.

 ----

 ## lark.Lark

 The Lark class is the main interface for the library. It's mostly a thin wrapper for the many different parsers, and for the tree constructor.

 #### \_\_init\_\_(self, grammar_string, **options)

 Creates an instance of Lark with the given grammar

 #### open(cls, grammar_filename, rel_to=None, **options)

 Creates an instance of Lark with the grammar given by its filename

 If rel_to is provided, the function will find the grammar filename in relation to it.

 Example:

 ```python
    >>> Lark.open("grammar_file.lark", rel_to=__file__, parser="lalr")
    Lark(...)
 ```

 #### parse(self, text)

 Return a complete parse tree for the text (of type Tree)

 If a transformer is supplied to `__init__`, returns whatever is the result of the transformation.


 #### save(self, f) / load(cls, f)

 Useful for caching and multiprocessing.

 `save` saves the instance into the given file object

 `load` loads an instance from the given file object

 ####


 ### Lark Options
 #### General options

 **start** - The start symbol. Either a string, or a list of strings for multiple possible starts (Default: "start")

 **debug** - Display debug information, such as warnings (default: False)

 **transformer** - Applies the transformer to every parse tree (equivlent to applying it after the parse, but faster)

 **propagate_positions** - Propagates (line, column, end_line, end_column) attributes into all tree branches.

 **maybe_placeholders** -
 - When True, the `[]` operator returns `None` when not matched.
 - When `False`,  `[]` behaves like the `?` operator, and returns no value at all.
 - (default=`False`. Recommended to set to `True`)

 **g_regex_flags** - Flags that are applied to all terminals (both regex and strings)

 **keep_all_tokens** - Prevent the tree builder from automagically removing "punctuation" tokens (default: False)

 **cache** - Cache the results of the Lark grammar analysis, for x2 to x3 faster loading. LALR only for now.
 - When `False`, does nothing (default)
 - When `True`, caches to a temporary file in the local directory
 - When given a string, caches to the path pointed by the string

 #### Algorithm

 **parser** - Decides which parser engine to use, "earley" or "lalr". (Default: "earley")
            (there is also a "cyk" option for legacy)

 **lexer** - Decides whether or not to use a lexer stage

 - "auto" (default): Choose for me based on the parser
 - "standard": Use a standard lexer
 - "contextual": Stronger lexer (only works with parser="lalr")
 - "dynamic": Flexible and powerful (only with parser="earley")
 - "dynamic_complete": Same as dynamic, but tries *every* variation of tokenizing possible. (only with parser="earley")

 **ambiguity** - Decides how to handle ambiguity in the parse. Only relevant if parser="earley"
 - "resolve": The parser will automatically choose the simplest derivation (it chooses consistently: greedy for tokens, non-greedy for rules)
 - "explicit": The parser will return all derivations wrapped in "_ambig" tree nodes (i.e. a forest).

 #### Domain Specific

 - **postlex** - Lexer post-processing (Default: None) Only works with the standard and contextual lexers.
 - **priority** - How priorities should be evaluated - auto, none, normal, invert (Default: auto)
 - **lexer_callbacks** - Dictionary of callbacks for the lexer. May alter tokens during lexing. Use with caution.
 - **edit_terminals** - A callback

 ----

 ## Tree

 The main tree class

 * `data` - The name of the rule or alias
 * `children` - List of matched sub-rules and terminals
 * `meta` - Line & Column numbers (if `propagate_positions` is enabled)
    * meta attributes: `line`, `column`, `start_pos`, `end_line`, `end_column`, `end_pos`

 #### \_\_init\_\_(self, data, children)

 Creates a new tree, and stores "data" and "children" in attributes of the same name.

 #### pretty(self, indent_str='  ')

 Returns an indented string representation of the tree. Great for debugging.

 #### find_pred(self, pred)

 Returns all nodes of the tree that evaluate pred(node) as true.

 #### find_data(self, data)

 Returns all nodes of the tree whose data equals the given data.

 #### iter_subtrees(self)

 Depth-first iteration.

 Iterates over all the subtrees, never returning to the same node twice (Lark's parse-tree is actually a DAG).

 #### iter_subtrees_topdown(self)

 Breadth-first iteration.

 Iterates over all the subtrees, return nodes in order like pretty() does.

 #### \_\_eq\_\_, \_\_hash\_\_

 Trees can be hashed and compared.

 ----

 ## Token

 When using a lexer, the resulting tokens in the trees will be of the Token class, which inherits from Python's string. So, normal string comparisons and operations will work as expected. Tokens also have other useful attributes:

 * `type` - Name of the token (as specified in grammar).
 * `pos_in_stream` - the index of the token in the text
 * `line` - The line of the token in the text (starting with 1)
 * `column` - The column of the token in the text (starting with 1)
 * `end_line` - The line where the token ends
 * `end_column` - The next column after the end of the token. For example, if the token is a single character with a `column` value of 4, `end_column` will be 5.
 * `end_pos` - the index where the token ends (basically pos_in_stream + len(token))

 ## Transformer
 ## Visitor
 ## Interpreter

 See the [visitors page](visitors.md)


 ## UnexpectedInput

 ## UnexpectedToken

 ## UnexpectedException

 - `UnexpectedInput`
    - `UnexpectedToken` - The parser recieved an unexpected token
    - `UnexpectedCharacters` - The lexer encountered an unexpected string

 After catching one of these exceptions, you may call the following helper methods to create a nicer error message:

 #### get_context(text, span)

 Returns a pretty string pinpointing the error in the text, with `span` amount of context characters around it.

 (The parser doesn't hold a copy of the text it has to parse, so you have to provide it again)

 #### match_examples(parse_fn, examples)

 Allows you to detect what's wrong in the input text by matching against example errors.

 Accepts the parse function (usually `lark_instance.parse`) and a dictionary of `{'example_string': value}`.

 The function will iterate the dictionary until it finds a matching error, and return the corresponding value.

 For an example usage, see: [examples/error_reporting_lalr.py](https://github.com/lark-parser/lark/blob/master/examples/error_reporting_lalr.py)
--- a/docs/classes.rst
+++ b/docs/classes.rst
@@ -0,0 +1,70 @@
 API Reference
 =============

 Lark
 ----

 .. autoclass:: lark.Lark
    :members: open, parse, save, load


 Using Unicode character classes with ``regex``
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

 Python's builtin ``re`` module has a few persistent known bugs and also won't parse
 advanced regex features such as character classes.
 With ``pip install lark-parser[regex]``, the ``regex`` module will be
 installed alongside lark and can act as a drop-in replacement to ``re``.

 Any instance of Lark instantiated with ``regex=True`` will use the ``regex`` module instead of ``re``.

 For example, we can use character classes to match PEP-3131 compliant Python identifiers:

 ::

    from lark import Lark
    >>> g = Lark(r"""
                        ?start: NAME
                        NAME: ID_START ID_CONTINUE*
                        ID_START: /[\p{Lu}\p{Ll}\p{Lt}\p{Lm}\p{Lo}\p{Nl}_]+/
                        ID_CONTINUE: ID_START | /[\p{Mn}\p{Mc}\p{Nd}\p{Pc}·]+/
                    """, regex=True)

    >>> g.parse('வணக்கம்')
    'வணக்கம்'


 Tree
 ----

 .. autoclass:: lark.Tree
    :members: pretty, find_pred, find_data, iter_subtrees,
        iter_subtrees_topdown

 Token
 -----

 .. autoclass:: lark.Token

 Transformer, Visitor & Interpreter
 ----------------------------------

 See :doc:`visitors`.

 UnexpectedInput
 ---------------

 .. autoclass:: lark.exceptions.UnexpectedInput
    :members: get_context, match_examples

 .. autoclass:: lark.exceptions.UnexpectedToken

 .. autoclass:: lark.exceptions.UnexpectedCharacters

 .. _parserpuppet:

 ParserPuppet
 ------------

 .. autoclass:: lark.parsers.lalr_puppet.ParserPuppet
    :members: choices, feed_token, copy, pretty, resume_parse
--- a/docs/conf.py
+++ b/docs/conf.py
@@ -0,0 +1,184 @@
 #!/usr/bin/env python3
 # -*- coding: utf-8 -*-
 #
 # Lark documentation build configuration file, created by
 # sphinx-quickstart on Sun Aug 16 13:09:41 2020.
 #
 # This file is execfile()d with the current directory set to its
 # containing dir.
 #
 # Note that not all possible configuration values are present in this
 # autogenerated file.
 #
 # All configuration values have a default; values that are commented out
 # serve to show the default.

 # If extensions (or modules to document with autodoc) are in another directory,
 # add these directories to sys.path here. If the directory is relative to the
 # documentation root, use os.path.abspath to make it absolute, like shown here.
 #
 import os
 import sys
 sys.path.insert(0, os.path.abspath('..'))
 autodoc_member_order = 'bysource'


 # -- General configuration ------------------------------------------------

 # If your documentation needs a minimal Sphinx version, state it here.
 #
 # needs_sphinx = '1.0'

 # Add any Sphinx extension module names here, as strings. They can be
 # extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
 # ones.
 extensions = [
    'sphinx.ext.autodoc',
    'sphinx.ext.napoleon',
    'sphinx.ext.coverage',
    'recommonmark',
    'sphinx_gallery.gen_gallery'
 ]

 # Add any paths that contain templates here, relative to this directory.
 templates_path = ['_templates']

 # The suffix(es) of source filenames.
 # You can specify multiple suffix as a list of string:
 #
 # source_suffix = ['.rst', '.md']
 source_suffix = {
    '.rst': 'restructuredtext',
    '.md': 'markdown'
 }


 # The master toctree document.
 master_doc = 'index'

 # General information about the project.
 project = 'Lark'
 copyright = '2020, Erez Shinan'
 author = 'Erez Shinan'

 # The version info for the project you're documenting, acts as replacement for
 # |version| and |release|, also used in various other places throughout the
 # built documents.
 #
 # The short X.Y version.
 version = ''
 # The full version, including alpha/beta/rc tags.
 release = ''

 # The language for content autogenerated by Sphinx. Refer to documentation
 # for a list of supported languages.
 #
 # This is also used if you do content translation via gettext catalogs.
 # Usually you set "language" from the command line for these cases.
 language = None

 # List of patterns, relative to source directory, that match files and
 # directories to ignore when looking for source files.
 # This patterns also effect to html_static_path and html_extra_path
 exclude_patterns = ['_build', 'Thumbs.db', '.DS_Store']

 # The name of the Pygments (syntax highlighting) style to use.
 pygments_style = 'sphinx'

 # If true, `todo` and `todoList` produce output, else they produce nothing.
 todo_include_todos = False


 # -- Options for HTML output ----------------------------------------------

 # The theme to use for HTML and HTML Help pages.  See the documentation for
 # a list of builtin themes.
 #
 html_theme = 'sphinx_rtd_theme'

 # Theme options are theme-specific and customize the look and feel of a theme
 # further.  For a list of options available for each theme, see the
 # documentation.
 #
 # html_theme_options = {}

 # Add any paths that contain custom static files (such as style sheets) here,
 # relative to this directory. They are copied after the builtin static files,
 # so a file named "default.css" will overwrite the builtin "default.css".
 html_static_path = ['_static']

 # Custom sidebar templates, must be a dictionary that maps document names
 # to template names.
 #
 # This is required for the alabaster theme
 # refs: http://alabaster.readthedocs.io/en/latest/installation.html#sidebars
 html_sidebars = {
    '**': [
        'relations.html',  # needs 'show_related': True theme option to display
        'searchbox.html',
    ]
 }


 # -- Options for HTMLHelp output ------------------------------------------

 # Output file base name for HTML help builder.
 htmlhelp_basename = 'Larkdoc'


 # -- Options for LaTeX output ---------------------------------------------

 latex_elements = {
    # The paper size ('letterpaper' or 'a4paper').
    #
    # 'papersize': 'letterpaper',

    # The font size ('10pt', '11pt' or '12pt').
    #
    # 'pointsize': '10pt',

    # Additional stuff for the LaTeX preamble.
    #
    # 'preamble': '',

    # Latex figure (float) alignment
    #
    # 'figure_align': 'htbp',
 }

 # Grouping the document tree into LaTeX files. List of tuples
 # (source start file, target name, title,
 #  author, documentclass [howto, manual, or own class]).
 latex_documents = [
    (master_doc, 'Lark.tex', 'Lark Documentation',
     'Erez Shinan', 'manual'),
 ]


 # -- Options for manual page output ---------------------------------------

 # One entry per manual page. List of tuples
 # (source start file, name, description, authors, manual section).
 man_pages = [
    (master_doc, 'lark', 'Lark Documentation',
     [author], 1)
 ]


 # -- Options for Texinfo output -------------------------------------------

 # Grouping the document tree into Texinfo files. List of tuples
 # (source start file, target name, title, author,
 #  dir menu entry, description, category)
 texinfo_documents = [
    (master_doc, 'Lark', 'Lark Documentation',
     author, 'Lark', 'One line description of project.',
     'Miscellaneous'),
 ]

 # -- Sphinx gallery config -------------------------------------------

 sphinx_gallery_conf = {
    'examples_dirs': ['../examples'],
    'gallery_dirs': ['examples'],
 }
--- a/docs/features.md
+++ b/docs/features.md
@@ -1,4 +1,6 @@
 # Main Features
 # Features

 ## Main Features
 - Earley parser, capable of parsing any context-free grammar
   - Implements SPPF, for efficient parsing and storing of ambiguous grammars.
 - LALR(1) parser, limited in power of expression, but very efficient in space and performance (O(n)).
@@ -6,6 +8,7 @@
 - EBNF-inspired grammar, with extra features (See: [Grammar Reference](grammar.md))
 - Builds a parse-tree (AST) automagically based on the grammar
 - Stand-alone parser generator - create a small independent parser to embed in your project.
 - Flexible error handling by using a "puppet parser" mechanism (LALR only)
 - Automatic line & column tracking (for both tokens and matched rules)
 - Automatic terminal collision resolution
 - Standard library of terminals (strings, numbers, names, etc.)
@@ -17,11 +20,14 @@

 [Read more about the parsers](parsers.md)

 # Extra features
 ## Extra features

  - Import rules and tokens from other Lark grammars, for code reuse and modularity.
  - Import grammars from Nearley.js
  - Support for external regex module ([see here](classes.md#using-unicode-character-classes-with-regex))
  - Import grammars from Nearley.js ([read more](nearley.md))
  - CYK parser
  - Visualize your parse trees as dot or png files ([see_example](https://github.com/lark-parser/lark/blob/master/examples/fruitflies.py))


 ### Experimental features
  - Automatic reconstruction of input from parse-tree (see examples)
--- a/docs/grammar.md
+++ b/docs/grammar.md
@@ -1,13 +1,5 @@
 # Grammar Reference

 Table of contents:

 1. [Definitions](#defs)
 1. [Terminals](#terms)
 1. [Rules](#rules)
 1. [Directives](#dirs)

 <a name="defs"></a>
 ## Definitions

 A **grammar** is a list of rules and terminals, that together define a language.
@@ -20,7 +12,7 @@ Each rule is a list of terminals and rules, whose location and nesting define th

 A **parsing algorithm** is an algorithm that takes a grammar definition and a sequence of symbols (members of the alphabet), and matches the entirety of the sequence by searching for a structure that is allowed by the grammar.

 ## General Syntax and notes
 ### General Syntax and notes

 Grammars in Lark are based on [EBNF](https://en.wikipedia.org/wiki/Extended_Backus–Naur_form) syntax, with several enhancements.

@@ -58,7 +50,6 @@ Lark begins the parse with the rule 'start', unless specified otherwise in the o
 Names of rules are always in lowercase, while names of terminals are always in uppercase. This distinction has practical effects, for the shape of the generated parse-tree, and the automatic construction of the lexer (aka tokenizer, or scanner).


 <a name="terms"></a>
 ## Terminals

 Terminals are used to match text into symbols. They can be defined as a combination of literals and other terminals.
@@ -83,12 +74,50 @@ Terminals also support grammar operators, such as `|`, `+`, `*` and `?`.

 Terminals are a linear construct, and therefore may not contain themselves (recursion isn't allowed).

 ### Templates

 Templates are expanded when preprocessing the grammar.

 Definition syntax:

 ```ebnf
  my_template{param1, param2, ...}: <EBNF EXPRESSION>
 ```

 Use syntax:

 ```ebnf
 some_rule: my_template{arg1, arg2, ...}
 ```

 Example:
 ```ebnf
 _separated{x, sep}: x (sep x)*  // Define a sequence of 'x sep x sep x ...'

 num_list: "[" _separated{NUMBER, ","} "]"   // Will match "[1, 2, 3]" etc.
 ```

 ### Priority

 Terminals can be assigned priority only when using a lexer (future versions may support Earley's dynamic lexing).

 Priority can be either positive or negative. If not specified for a terminal, it defaults to 1.

 Highest priority terminals are always matched first.

 ### Regexp Flags

 You can use flags on regexps and strings. For example:

 ```perl
 SELECT: "select"i     //# Will ignore case, and match SELECT or Select, etc.
 MULTILINE_TEXT: /.+/s
 ```

 Supported flags are one of: `imslu`. See Python's regex documentation for more details on each one.

 Regexps/strings of different flags can only be concatenated in Python 3.6+

 #### Notes for when using a lexer:

 When using a lexer (standard or contextual), it is the grammar-author's responsibility to make sure the literals don't collide, or that if they do, they are matched in the desired order. Literals are matched according to the following precedence:
@@ -154,7 +183,6 @@ _ambig
 ```


 <a name="rules"></a>
 ## Rules

 **Syntax:**
@@ -176,7 +204,7 @@ Each item is one of:
 * `TERMINAL`
 * `"string literal"` or `/regexp literal/`
 * `(item item ..)` - Group items
 * `[item item ..]` - Maybe. Same as `(item item ..)?`, but generates `None` if there is no match
 * `[item item ..]` - Maybe. Same as `(item item ..)?`, but when `maybe_placeholders=True`, generates `None` if there is no match.
 * `item?` - Zero or one instances of item ("maybe")
 * `item*` - Zero or more instances of item
 * `item+` - One or more instances of item
@@ -256,3 +284,4 @@ Note that `%ignore` directives cannot be imported. Imported rules will abide by
 ### %declare

 Declare a terminal without defining it. Useful for plugins.

--- a/docs/how_to_use.md
+++ b/docs/how_to_use.md
@@ -22,20 +22,21 @@ Of course, some specific use-cases may deviate from this process. Feel free to s

 Browse the [Examples](https://github.com/lark-parser/lark/tree/master/examples) to find a template that suits your purposes.

 Read the tutorials to get a better understanding of how everything works. (links in the [main page](/))
 Read the tutorials to get a better understanding of how everything works. (links in the [main page](/index))

 Use the [Cheatsheet (PDF)](lark_cheatsheet.pdf) for quick reference.
 Use the [Cheatsheet (PDF)](/_static/lark_cheatsheet.pdf) for quick reference.

 Use the reference pages for more in-depth explanations. (links in the [main page](/)]
 Use the reference pages for more in-depth explanations. (links in the [main page](/index)]

 ## LALR usage

 By default Lark silently resolves Shift/Reduce conflicts as Shift. To enable warnings pass `debug=True`. To get the messages printed you have to configure `logging` framework beforehand. For example:
 By default Lark silently resolves Shift/Reduce conflicts as Shift. To enable warnings pass `debug=True`. To get the messages printed you have to configure the `logger` beforehand. For example:

 ```python
 from lark import Lark
 import logging
 logging.basicConfig(level=logging.DEBUG)
 from lark import Lark, logger

 logger.setLevel(logging.DEBUG)

 collision_grammar = '''
 start: as as
--- a/docs/ide/app.html
+++ b/docs/ide/app.html
@@ -0,0 +1,101 @@
 <!doctype html>
 <html>
 <head>
 	<meta charset="UTF-8">

    <!-- flip comment below to use local pyodide -->
    <script src="https://pyodide-cdn2.iodide.io/v0.15.0/full/pyodide.js"></script>
    <!-- <script src="./pyodide/pyodide.js"></script> -->

    <link rel="stylesheet" href="https://unpkg.com/purecss@1.0.1/build/base-min.css">
    <link href="https://fonts.googleapis.com/css2?family=Inconsolata:wght@500&display=swap" rel="stylesheet">

    <script src="app.js"></script>
    <style>
        .is-loading:after {
            background-image: url(is-loading.gif);
            background-position: center 35%;
            background-repeat: no-repeat;
            background-color: hsla(0, 0%, 100%, .6);
            position: absolute;
            z-index: 700;
            content: " ";
            width: 100%;
            height: 100%;
            display: block;
            left: 0;
            right: 0;
            top: 0;
            bottom: 0
        }

        h1 {
            text-align: center;
        }

        textarea, select, body > div > ul {
            /* display: block;
            margin: 15px auto;
            width: 90%;
            font-weight: bold;
            color: #052569; */
            font-family: 'Inconsolata', monospace;
        }

        textarea {
            margin: 10px;
            width: 90%;
            padding: 10px;
            font-size: 1.4em;
        }

        #grammar {
            min-height: 300px;
        }
        #input {
            min-height: 100px;
        }

        ul ul {
            border-left: 1px dotted silver;
            margin-left: -16px;
        }

        li {
            list-style: circle;
            margin-left: 10px;
        }

        select {
            padding: 5px;
        }

        #inputs {
            text-align: center;
        }

        #result {
            display: flex;
            justify-content: center;
        }

        #result > ul {
            margin: 10px;
            width: 90%;
            padding: 10px;
            font-size: 1.2em;
        }

        menu {
            display: flex;
        }

        main {
            margin: auto;
        }

    </style>
 </head>
 <body class="is-loading">
 </body>
 </html>
--- a/docs/ide/app.js
+++ b/docs/ide/app.js
@@ -0,0 +1,105 @@
 class app {

 	constructor(modules, invocation){
 		languagePluginLoader.then(() => {
 			// If you don't require for pre-loaded Python packages, remove this promise below.
 			window.pyodide.runPythonAsync("import setuptools, micropip").then(()=>{
 				window.pyodide.runPythonAsync("micropip.install('lark-parser')").then(()=>{
 					this.fetchSources(modules).then(() => {
 						window.pyodide.runPythonAsync("import " + Object.keys(modules).join("\nimport ") + "\n" + invocation + "\n").then(() => this.initializingComplete());
 					});
 				});
 			});
 		});
 	}

 	loadSources(module, baseURL, files) {
 		let promises = [];

 		for (let f in files) {
 			promises.push(
 				new Promise((resolve, reject) => {
 					let file = files[f];
 					let url = (baseURL ? baseURL + "/" : "") + file;

 					fetch(url, {}).then((response) => {
 						if (response.status === 200)
 							return response.text().then((code) => {
 								let path = ("/lib/python3.7/site-packages/" + module + "/" + file).split("/");
 								let lookup = "";

 								for (let i in path) {
 									if (!path[i]) {
 										continue;
 									}

 									lookup += (lookup ? "/" : "") + path[i];

 									if (parseInt(i) === path.length - 1) {
 										window.pyodide._module.FS.writeFile(lookup, code);
 										console.debug(`fetched ${lookup}`);
 									} else {
 										try {
 											window.pyodide._module.FS.lookupPath(lookup);
 										} catch {
 											window.pyodide._module.FS.mkdir(lookup);
 											console.debug(`created ${lookup}`);
 										}
 									}
 								}

 								resolve();
 							});
 						else
 							reject();
 					});
 				})
 			);
 		}

 		return Promise.all(promises);
 	}

 	fetchSources(modules) {
 		let promises = [];

 		for( let module of Object.keys(modules) )
 		{
 			promises.push(
 				new Promise((resolve, reject) => {
 					fetch(`${modules[module]}/files.json`, {}).then((response) => {
 						if (response.status === 200) {
 							response.text().then((list) => {
 								let files = JSON.parse(list);

 								this.loadSources(module, modules[module], files).then(() => {
 									resolve();
 								})
 							})
 						} else {
 							reject();
 						}
 					})
 				}));
 		}

 		return Promise.all(promises).then(() => {
 			for( let module of Object.keys(modules) ) {
 				window.pyodide.loadedPackages[module] = "default channel";
 			}

 			window.pyodide.runPython(
 				'import importlib as _importlib\n' +
 				'_importlib.invalidate_caches()\n'
 			);
 		});
 	}

 	initializingComplete() {
 		document.body.classList.remove("is-loading")
 	}
 }

 (function () {
 	window.top.app = new app({"app": "app"}, "import app.app; app.app.start()");
 })();
--- a/docs/ide/app/app.py
+++ b/docs/ide/app/app.py
@@ -0,0 +1,83 @@
 from . import html5
 from .examples import examples

 from lark import Lark
 from lark.tree import Tree


 class App(html5.Div):
 	def __init__(self):
 		super().__init__("""
 			<h1>
 				<img src="lark-logo.png"> IDE
 			</h1>

 			<main>
 			<menu>
 				<select [name]="examples">
 					<option disabled selected>Examples</option>
 				</select>
 				<select [name]="parser">
 					<option value="earley" selected>Earley (default)</option>
 					<option value="lalr">LALR</option>
 					<option value="cyk">CYK</option>
 				</select>
 			</menu>
 			<div id="inputs">
 				<div>
 					<div>Grammar:</div>
 					<textarea [name]="grammar" id="grammar" placeholder="Lark Grammar..."></textarea>
 				</div>
 				<div>
 					<div>Input:</div>
 					<textarea [name]="input" id="input" placeholder="Parser input..."></textarea>
 				</div>
 			</div>
 			<div id="result">
 				<ul [name]="ast" />
 			</div>
 			</main>
 		""")
 		self.sinkEvent("onKeyUp", "onChange")

 		self.parser = "earley"

 		# Pre-load examples
 		for name, (grammar, input) in examples.items():
 			option = html5.Option(name)
 			option.grammar = grammar
 			option.input = input

 			self.examples.appendChild(option)

 	def onChange(self, e):
 		if html5.utils.doesEventHitWidgetOrChildren(e, self.examples):
 			example = self.examples.children(self.examples["selectedIndex"])
 			self.grammar["value"] = example.grammar.strip()
 			self.input["value"] = example.input.strip()
 			self.onKeyUp()

 		elif html5.utils.doesEventHitWidgetOrChildren(e, self.parser):
 			self.parser = self.parser.children(self.parser["selectedIndex"])["value"]
 			self.onKeyUp()

 	def onKeyUp(self, e=None):
 		l = Lark(self.grammar["value"], parser=self.parser)

 		try:
 			ast = l.parse(self.input["value"])
 		except Exception as e:
 			self.ast.appendChild(
 				html5.Li(str(e)), replace=True
 			)

 		print(ast)
 		traverse = lambda node: html5.Li([node.data, html5.Ul([traverse(c) for c in node.children])] if isinstance(node, Tree) else node)
 		self.ast.appendChild(traverse(ast), replace=True)


 def start():
 	html5.Body().appendChild(
 		App()
 	)

--- a/docs/ide/app/core.py
+++ b/docs/ide/app/core.py
--- a/docs/ide/app/examples.py
+++ b/docs/ide/app/examples.py
@@ -0,0 +1,150 @@

 # Examples formattet this way:
 #   "name": ("grammar", "demo-input")

 examples = {

 	# --- hello.lark ---
 	"hello.lark": ("""
 start: WORD "," WORD "!"

 %import common.WORD   // imports from terminal library
 %ignore " "           // Disregard spaces in text
 """, "Hello, World!"),

 	# --- calc.lark ---
 "calc.lark": ("""
 ?start: sum
      | NAME "=" sum    -> assign_var

 ?sum: product
    | sum "+" product   -> add
    | sum "-" product   -> sub

 ?product: atom
    | product "*" atom  -> mul
    | product "/" atom  -> div

 ?atom: NUMBER           -> number
     | "-" atom         -> neg
     | NAME             -> var
     | "(" sum ")"

 %import common.CNAME -> NAME
 %import common.NUMBER
 %import common.WS_INLINE
 %ignore WS_INLINE""",
 	"1 + 2 * 3 + 4"),

 	# --- json.lark ---
 	"json.lark": ("""
 ?start: value
 ?value: object
      | array
      | string
      | SIGNED_NUMBER      -> number
      | "true"             -> true
      | "false"            -> false
      | "null"             -> null
 array  : "[" [value ("," value)*] "]"
 object : "{" [pair ("," pair)*] "}"
 pair   : string ":" value
 string : ESCAPED_STRING
 %import common.ESCAPED_STRING
 %import common.SIGNED_NUMBER
 %import common.WS
 %ignore WS""",
 """
 [
  {
    "_id": "5edb875cf3d764da55602437",
    "index": 0,
    "guid": "3dae2206-5d4d-41fe-b81d-dc8cdba7acaa",
    "isActive": false,
    "balance": "$2,872.54",
    "picture": "http://placehold.it/32x32",
    "age": 24,
    "eyeColor": "blue",
    "name": "Theresa Vargas",
    "gender": "female",
    "company": "GEEKOL",
    "email": "theresavargas@geekol.com",
    "phone": "+1 (930) 450-3445",
    "address": "418 Herbert Street, Sexton, Florida, 1375",
    "about": "Id minim deserunt laborum enim. Veniam commodo incididunt amet aute esse duis veniam occaecat nulla esse aute et deserunt eiusmod. Anim elit ullamco minim magna sint laboris. Est consequat quis deserunt excepteur in magna pariatur laborum quis eu. Ex quis tempor elit qui qui et culpa sunt sit esse mollit cupidatat. Fugiat cillum deserunt enim minim irure reprehenderit est. Voluptate nisi quis amet quis incididunt pariatur nostrud Lorem consectetur adipisicing voluptate.\\r\\n",
    "registered": "2016-11-19T01:02:42 -01:00",
    "latitude": -25.65267,
    "longitude": 104.19531,
    "tags": [
      "eiusmod",
      "reprehenderit",
      "anim",
      "sunt",
      "esse",
      "proident",
      "esse"
    ],
    "friends": [
      {
        "id": 0,
        "name": "Roth Herrera"
      },
      {
        "id": 1,
        "name": "Callie Christian"
      },
      {
        "id": 2,
        "name": "Gracie Whitfield"
      }
    ],
    "greeting": "Hello, Theresa Vargas! You have 6 unread messages.",
    "favoriteFruit": "banana"
  },
  {
    "_id": "5edb875c845eb08161a83e64",
    "index": 1,
    "guid": "a8ada2c1-e2c7-40d3-96b4-52c93baff7f0",
    "isActive": false,
    "balance": "$2,717.04",
    "picture": "http://placehold.it/32x32",
    "age": 23,
    "eyeColor": "green",
    "name": "Lily Ross",
    "gender": "female",
    "company": "RODEOMAD",
    "email": "lilyross@rodeomad.com",
    "phone": "+1 (941) 465-3561",
    "address": "525 Beekman Place, Blodgett, Marshall Islands, 3173",
    "about": "Aliquip duis proident excepteur eiusmod in quis officia consequat culpa eu et ut. Occaecat reprehenderit tempor mollit do eu magna qui et magna exercitation aliqua. Incididunt exercitation dolor proident eiusmod minim occaecat. Sunt et minim mollit et veniam sint ex. Duis ullamco elit aute eu excepteur reprehenderit officia.\\r\\n",
    "registered": "2019-11-02T04:06:42 -01:00",
    "latitude": 17.031701,
    "longitude": -42.657106,
    "tags": [
      "id",
      "non",
      "culpa",
      "reprehenderit",
      "esse",
      "elit",
      "sit"
    ],
    "friends": [
      {
        "id": 0,
        "name": "Ursula Maldonado"
      },
      {
        "id": 1,
        "name": "Traci Huff"
      },
      {
        "id": 2,
        "name": "Taylor Holt"
      }
    ],
    "greeting": "Hello, Lily Ross! You have 3 unread messages.",
    "favoriteFruit": "strawberry"
  }
 ]""")
 }
--- a/docs/ide/app/ext.py
+++ b/docs/ide/app/ext.py
@@ -0,0 +1,475 @@
 # -*- coding: utf-8 -*-
 from . import core as html5
 from . import utils

 class Button(html5.Button):

 	def __init__(self, txt=None, callback=None, className=None, *args, **kwargs):
 		super().__init__(*args, **kwargs)
 		self["class"] = "btn"

 		if className:
 			self.addClass(className)

 		self["type"] = "button"

 		if txt is not None:
 			self.setText(txt)

 		self.callback = callback
 		self.sinkEvent("onClick")

 	def setText(self, txt):
 		if txt is not None:
 			self.element.innerHTML = txt
 			self["title"] = txt
 		else:
 			self.element.innerHTML = ""
 			self["title"] = ""

 	def onClick(self, event):
 		event.stopPropagation()
 		event.preventDefault()
 		if self.callback is not None:
 			self.callback(self)


 class Input(html5.Input):
 	def __init__(self, type="text", placeholder=None, callback=None, id=None, focusCallback=None, *args, **kwargs):
 		"""

 		:param type: Input type. Default: "text
 		:param placeholder: Placeholder text. Default: None
 		:param callback: Function to be called onChanged: callback(id, value)
 		:param id: Optional id of the input element. Will be passed to callback
 		:return:
 		"""
 		super().__init__(*args, **kwargs)
 		self["class"] = "input"
 		self["type"] = type
 		if placeholder is not None:
 			self["placeholder"] = placeholder

 		self.callback = callback
 		if id is not None:
 			self["id"] = id
 		self.sinkEvent("onChange")

 		self.focusCallback = focusCallback
 		if focusCallback:
 			self.sinkEvent("onFocus")

 	def onChange(self, event):
 		event.stopPropagation()
 		event.preventDefault()
 		if self.callback is not None:
 			self.callback(self, self["id"], self["value"])

 	def onFocus(self, event):
 		event.stopPropagation()
 		event.preventDefault()
 		if self.focusCallback is not None:
 			self.focusCallback(self, self["id"], self["value"])

 	def onDetach(self):
 		super().onDetach()
 		self.callback = None


 class Popup(html5.Div):
 	def __init__(self, title=None, id=None, className=None, icon=None, enableShortcuts=True, closeable=True, *args, **kwargs):
 		super().__init__("""
 			<div class="box" [name]="popupBox">
 				<div class="box-head" [name]="popupHead">
 					<div class="item" [name]="popupHeadItem">
 						<div class="item-image">
 							<i class="i i--small" [name]="popupIcon"></i>
 						</div>
 						<div class="item-content">
 							<div class="item-headline" [name]="popupHeadline"></div>
 						</div>
 					</div>
 				</div>
 				<div class="box-body box--content" [name]="popupBody"></div>
 				<div class="box-foot box--content bar" [name]="popupFoot"></div>
 			</div>
 		""")

 		self.appendChild = self.popupBody.appendChild
 		self.fromHTML = lambda *args, **kwargs: self.popupBody.fromHTML(*args, **kwargs) if kwargs.get("bindTo") else self.popupBody.fromHTML(bindTo=self, *args, **kwargs)

 		self["class"] = "popup popup--center is-active"
 		if className:
 			self.addClass(className)

 		if closeable:
 			closeBtn = Button("&times;", self.close, className="item-action")
 			closeBtn.removeClass("btn")
 			self.popupHeadItem.appendChild(closeBtn)

 		if title:
 			self.popupHeadline.appendChild(title)

 		if icon:
 			self.popupIcon.appendChild(icon[0])
 		elif title:
 			self.popupIcon.appendChild(title[0])
 		else:
 			self.popupIcon.appendChild("Vi") #fixme!!! this _LIBRARY_ is not only used in the Vi...

 		# id can be used to pass information to callbacks
 		self.id = id

 		#FIXME: Implement a global overlay! One popupOverlay next to a list of popups.
 		self.popupOverlay = html5.Div()
 		self.popupOverlay["class"] = "popup-overlay is-active"

 		self.enableShortcuts = enableShortcuts
 		self.onDocumentKeyDownMethod = None

 		self.popupOverlay.appendChild(self)
 		html5.Body().appendChild(self.popupOverlay)

 		#FIXME: Close/Cancel every popup with click on popupCloseBtn without removing the global overlay.

 	def onAttach(self):
 		super(Popup, self).onAttach()

 		if self.enableShortcuts:
 			self.onDocumentKeyDownMethod = self.onDocumentKeyDown  # safe reference to method
 			html5.document.addEventListener("keydown", self.onDocumentKeyDownMethod)

 	def onDetach(self):
 		super(Popup, self).onDetach()

 		if self.enableShortcuts:
 			html5.document.removeEventListener("keydown", self.onDocumentKeyDownMethod)

 	def onDocumentKeyDown(self, event):
 		if html5.isEscape(event):
 			self.close()

 	def close(self, *args, **kwargs):
 		html5.Body().removeChild(self.popupOverlay)
 		self.popupOverlay = None



 class InputDialog(Popup):
 	def __init__(self, text, value="", successHandler=None, abortHandler=None,
 				 	successLbl="OK", abortLbl="Cancel", placeholder="", *args, **kwargs):

 		super().__init__(*args, **kwargs)
 		self.addClass("popup--inputdialog")

 		self.sinkEvent("onKeyDown", "onKeyUp")

 		self.successHandler = successHandler
 		self.abortHandler = abortHandler

 		self.fromHTML(
 			"""
 			<div class="input-group">
 				<label class="label">
 					{{text}}
 				</label>
 				<input class="input" [name]="inputElem" value="{{value}}" placeholder="{{placeholder}}" />
 			</div>
 			""",
 			vars={
 				"text": text,
 				"value": value,
 				"placeholder": placeholder
 			}
 		)

 		# Cancel
 		self.popupFoot.appendChild(Button(abortLbl, self.onCancel, className="btn--cancel btn--danger"))

 		# Okay
 		self.okayBtn = Button(successLbl, self.onOkay, className="btn--okay btn--primary")
 		if not value:
 			self.okayBtn.disable()

 		self.popupFoot.appendChild(self.okayBtn)

 		self.inputElem.focus()

 	def onKeyDown(self, event):
 		if html5.isReturn(event) and self.inputElem["value"]:
 			event.stopPropagation()
 			event.preventDefault()
 			self.onOkay()

 	def onKeyUp(self, event):
 		if self.inputElem["value"]:
 			self.okayBtn.enable()
 		else:
 			self.okayBtn.disable()

 	def onDocumentKeyDown(self, event):
 		if html5.isEscape(event):
 			event.stopPropagation()
 			event.preventDefault()
 			self.onCancel()

 	def onOkay(self, *args, **kwargs):
 		if self.successHandler:
 			self.successHandler(self, self.inputElem["value"])
 		self.close()

 	def onCancel(self, *args, **kwargs):
 		if self.abortHandler:
 			self.abortHandler(self, self.inputElem["value"])
 		self.close()


 class Alert(Popup):
 	"""
 	Just displaying an alerting message box with OK-button.
 	"""

 	def __init__(self, msg, title=None, className=None, okCallback=None, okLabel="OK", icon="!", closeable=True, *args, **kwargs):
 		super().__init__(title, className=None, icon=icon, closeable=closeable, *args, **kwargs)
 		self.addClass("popup--alert")

 		if className:
 			self.addClass(className)

 		self.okCallback = okCallback

 		message = html5.Span()
 		message.addClass("alert-msg")
 		self.popupBody.appendChild(message)

 		if isinstance(msg, str):
 			msg = msg.replace("\n", "<br>")

 		message.appendChild(msg, bindTo=False)

 		self.sinkEvent("onKeyDown")

 		if closeable:
 			okBtn = Button(okLabel, callback=self.onOkBtnClick)
 			okBtn.addClass("btn--okay btn--primary")
 			self.popupFoot.appendChild(okBtn)

 			okBtn.focus()

 	def drop(self):
 		self.okCallback = None
 		self.close()

 	def onOkBtnClick(self, sender=None):
 		if self.okCallback:
 			self.okCallback(self)

 		self.drop()

 	def onKeyDown(self, event):
 		if html5.isReturn(event):
 			event.stopPropagation()
 			event.preventDefault()
 			self.onOkBtnClick()


 class YesNoDialog(Popup):
 	def __init__(self, question, title=None, yesCallback=None, noCallback=None,
 	                yesLabel="Yes", noLabel="No", icon="?",
 	                    closeable=False, *args, **kwargs):
 		super().__init__(title, closeable=closeable, icon=icon, *args, **kwargs)
 		self.addClass("popup--yesnodialog")

 		self.yesCallback = yesCallback
 		self.noCallback = noCallback

 		lbl = html5.Span()
 		lbl["class"].append("question")
 		self.popupBody.appendChild(lbl)

 		if isinstance(question, html5.Widget):
 			lbl.appendChild(question)
 		else:
 			utils.textToHtml(lbl, question)

 		if len(noLabel):
 			btnNo = Button(noLabel, className="btn--no", callback=self.onNoClicked)
 			#btnNo["class"].append("btn--no")
 			self.popupFoot.appendChild(btnNo)

 		btnYes = Button(yesLabel, callback=self.onYesClicked)
 		btnYes["class"].append("btn--yes")
 		self.popupFoot.appendChild(btnYes)

 		self.sinkEvent("onKeyDown")
 		btnYes.focus()

 	def onKeyDown(self, event):
 		if html5.isReturn(event):
 			event.stopPropagation()
 			event.preventDefault()
 			self.onYesClicked()

 	def onDocumentKeyDown(self, event):
 		if html5.isEscape(event):
 			event.stopPropagation()
 			event.preventDefault()
 			self.onNoClicked()

 	def drop(self):
 		self.yesCallback = None
 		self.noCallback = None
 		self.close()

 	def onYesClicked(self, *args, **kwargs):
 		if self.yesCallback:
 			self.yesCallback(self)

 		self.drop()

 	def onNoClicked(self, *args, **kwargs):
 		if self.noCallback:
 			self.noCallback(self)

 		self.drop()


 class SelectDialog(Popup):

 	def __init__(self, prompt, items=None, title=None, okBtn="OK", cancelBtn="Cancel", forceSelect=False,
 	             callback=None, *args, **kwargs):
 		super().__init__(title, *args, **kwargs)
 		self["class"].append("popup--selectdialog")

 		self.callback = callback
 		self.items = items
 		assert isinstance(self.items, list)

 		# Prompt
 		if prompt:
 			lbl = html5.Span()
 			lbl["class"].append("prompt")

 			if isinstance(prompt, html5.Widget):
 				lbl.appendChild(prompt)
 			else:
 				utils.textToHtml(lbl, prompt)

 			self.popupBody.appendChild(lbl)

 		# Items
 		if not forceSelect and len(items) <= 3:
 			for idx, item in enumerate(items):
 				if isinstance(item, dict):
 					title = item.get("title")
 					cssc = item.get("class")
 				elif isinstance(item, tuple):
 					title = item[1]
 					cssc = None
 				else:
 					title = item

 				btn = Button(title, callback=self.onAnyBtnClick)
 				btn.idx = idx

 				if cssc:
 					btn.addClass(cssc)

 				self.popupBody.appendChild(btn)
 		else:
 			self.select = html5.Select()
 			self.popupBody.appendChild(self.select)

 			for idx, item in enumerate(items):
 				if isinstance(item, dict):
 					title = item.get("title")
 				elif isinstance(item, tuple):
 					title = item[1]
 				else:
 					title = item

 				opt = html5.Option(title)
 				opt["value"] = str(idx)

 				self.select.appendChild(opt)

 			if okBtn:
 				self.popupFoot.appendChild(Button(okBtn, callback=self.onOkClick))

 			if cancelBtn:
 				self.popupFoot.appendChild(Button(cancelBtn, callback=self.onCancelClick))

 	def onAnyBtnClick(self, sender):
 		item = self.items[sender.idx]

 		if isinstance(item, dict) and item.get("callback") and callable(item["callback"]):
 			item["callback"](item)

 		if self.callback:
 			self.callback(item)

 		self.items = None
 		self.close()

 	def onCancelClick(self, sender=None):
 		self.close()

 	def onOkClick(self, sender=None):
 		assert self.select["selectedIndex"] >= 0
 		item = self.items[int(self.select.children(self.select["selectedIndex"])["value"])]

 		if isinstance(item, dict) and item.get("callback") and callable(item["callback"]):
 			item["callback"](item)

 		if self.callback:
 			self.callback(item)

 		self.items = None
 		self.select = None
 		self.close()


 class TextareaDialog(Popup):
 	def __init__(self, text, value="", successHandler=None, abortHandler=None, successLbl="OK", abortLbl="Cancel",
 	             *args, **kwargs):
 		super().__init__(*args, **kwargs)
 		self["class"].append("popup--textareadialog")

 		self.successHandler = successHandler
 		self.abortHandler = abortHandler

 		span = html5.Span()
 		span.element.innerHTML = text
 		self.popupBody.appendChild(span)

 		self.inputElem = html5.Textarea()
 		self.inputElem["value"] = value
 		self.popupBody.appendChild(self.inputElem)

 		okayBtn = Button(successLbl, self.onOkay)
 		okayBtn["class"].append("btn--okay")
 		self.popupFoot.appendChild(okayBtn)

 		cancelBtn = Button(abortLbl, self.onCancel)
 		cancelBtn["class"].append("btn--cancel")
 		self.popupFoot.appendChild(cancelBtn)

 		self.sinkEvent("onKeyDown")

 		self.inputElem.focus()

 	def onDocumentKeyDown(self, event):
 		if html5.isEscape(event):
 			event.stopPropagation()
 			event.preventDefault()
 			self.onCancel()

 	def onOkay(self, *args, **kwargs):
 		if self.successHandler:
 			self.successHandler(self, self.inputElem["value"])
 		self.close()

 	def onCancel(self, *args, **kwargs):
 		if self.abortHandler:
 			self.abortHandler(self, self.inputElem["value"])
 		self.close()
--- a/docs/ide/app/files.json
+++ b/docs/ide/app/files.json
@@ -0,0 +1,9 @@
 [
  "app.py",
  "examples.py",
  "html5.py",
  "core.py",
  "ext.py",
  "ignite.py",
  "utils.py"
 ]
--- a/docs/ide/app/html5.py
+++ b/docs/ide/app/html5.py
@@ -0,0 +1,6 @@
 #-*- coding: utf-8 -*-

 from .core import *
 from . import ext, utils, ignite


--- a/docs/ide/app/ignite.py
+++ b/docs/ide/app/ignite.py
@@ -0,0 +1,186 @@
 # -*- coding: utf-8 -*-
 from . import core as html5


@html5.tag
 class Label(html5.Label):
 	_parserTagName = "ignite-label"

 	def __init__(self, *args, **kwargs):
 		super(Label, self).__init__(style="label ignt-label", *args, **kwargs)


@html5.tag
 class Input(html5.Input):
 	_parserTagName = "ignite-input"

 	def __init__(self, *args, **kwargs):
 		super(Input, self).__init__(style="input ignt-input", *args, **kwargs)


@html5.tag
 class Switch(html5.Div):
 	_parserTagName = "ignite-switch"

 	def __init__(self, *args, **kwargs):
 		super(Switch, self).__init__(style="switch ignt-switch", *args, **kwargs)

 		self.input = html5.Input(style="switch-input")
 		self.appendChild(self.input)
 		self.input["type"] = "checkbox"

 		switchLabel = html5.Label(forElem=self.input)
 		switchLabel.addClass("switch-label")
 		self.appendChild(switchLabel)

 	def _setChecked(self, value):
 		self.input["checked"] = bool(value)

 	def _getChecked(self):
 		return self.input["checked"]


@html5.tag
 class Check(html5.Input):
 	_parserTagName = "ignite-check"

 	def __init__(self, *args, **kwargs):
 		super(Check, self).__init__(style="check ignt-check", *args, **kwargs)

 		checkInput = html5.Input()
 		checkInput.addClass("check-input")
 		checkInput["type"] = "checkbox"
 		self.appendChild(checkInput)

 		checkLabel = html5.Label(forElem=checkInput)
 		checkLabel.addClass("check-label")
 		self.appendChild(checkLabel)


@html5.tag
 class Radio(html5.Div):
 	_parserTagName = "ignite-radio"

 	def __init__(self, *args, **kwargs):
 		super(Radio, self).__init__(style="radio ignt-radio", *args, **kwargs)

 		radioInput = html5.Input()
 		radioInput.addClass("radio-input")
 		radioInput["type"] = "radio"
 		self.appendChild(radioInput)

 		radioLabel = html5.Label(forElem=radioInput)
 		radioLabel.addClass("radio-label")
 		self.appendChild(radioLabel)


@html5.tag
 class Select(html5.Select):
 	_parserTagName = "ignite-select"

 	def __init__(self, *args, **kwargs):
 		super(Select, self).__init__(style="select ignt-select", *args, **kwargs)

 		defaultOpt = html5.Option()
 		defaultOpt["selected"] = True
 		defaultOpt["disabled"] = True
 		defaultOpt.element.innerHTML = ""
 		self.appendChild(defaultOpt)


@html5.tag
 class Textarea(html5.Textarea):
 	_parserTagName = "ignite-textarea"

 	def __init__(self, *args, **kwargs):
 		super(Textarea, self).__init__(style="textarea ignt-textarea", *args, **kwargs)


@html5.tag
 class Progress(html5.Progress):
 	_parserTagName = "ignite-progress"

 	def __init__(self, *args, **kwargs):
 		super(Progress, self).__init__(style="progress ignt-progress", *args, **kwargs)


@html5.tag
 class Item(html5.Div):
 	_parserTagName = "ignite-item"

 	def __init__(self, title=None, descr=None, className=None, *args, **kwargs):
 		super(Item, self).__init__(style="item ignt-item", *args, **kwargs)
 		if className:
 			self.addClass(className)

 		self.fromHTML("""
 			<div class="item-image ignt-item-image" [name]="itemImage">
 			</div>
 			<div class="item-content ignt-item-content" [name]="itemContent">
 				<div class="item-headline ignt-item-headline" [name]="itemHeadline">
 				</div>
 			</div>
 		""")

 		if title:
 			self.itemHeadline.appendChild(html5.TextNode(title))

 		if descr:
 			self.itemSubline = html5.Div()
 			self.addClass("item-subline ignt-item-subline")
 			self.itemSubline.appendChild(html5.TextNode(descr))
 			self.appendChild(self.itemSubline)


@html5.tag
 class Table(html5.Table):
 	_parserTagName = "ignite-table"

 	def __init__(self, *args, **kwargs):
 		super(Table, self).__init__(*args, **kwargs)
 		self.head.addClass("ignt-table-head")
 		self.body.addClass("ignt-table-body")

 	def prepareRow(self, row):
 		assert row >= 0, "Cannot create rows with negative index"

 		for child in self.body._children:
 			row -= child["rowspan"]
 			if row < 0:
 				return

 		while row >= 0:
 			tableRow = html5.Tr()
 			tableRow.addClass("ignt-table-body-row")
 			self.body.appendChild(tableRow)
 			row -= 1

 	def prepareCol(self, row, col):
 		assert col >= 0, "Cannot create cols with negative index"
 		self.prepareRow(row)

 		for rowChild in self.body._children:
 			row -= rowChild["rowspan"]

 			if row < 0:
 				for colChild in rowChild._children:
 					col -= colChild["colspan"]
 					if col < 0:
 						return

 				while col >= 0:
 					tableCell = html5.Td()
 					tableCell.addClass("ignt-table-body-cell")
 					rowChild.appendChild(tableCell)
 					col -= 1

 				return
 	def fastGrid( self, rows, cols, createHidden=False ):
 		colsstr = "".join(['<td class="ignt-table-body-cell"></td>' for i in range(0, cols)])
 		tblstr = '<tbody [name]="body" class="ignt-table-body" >'

 		for r in range(0, rows):
 			tblstr += '<tr class="ignt-table-body-row %s">%s</tr>' %("is-hidden" if createHidden else "",colsstr)
 		tblstr +="</tbody>"

 		self.fromHTML(tblstr)
--- a/docs/ide/app/utils.py
+++ b/docs/ide/app/utils.py
@@ -0,0 +1,101 @@
 # -*- coding: utf-8 -*-
 from . import core as html5

 def unescape(val, maxLength = 0):
 	"""
 		Unquotes several HTML-quoted characters in a string.

 		:param val: The value to be unescaped.
 		:type val: str

 		:param maxLength: Cut-off after maxLength characters.
 				A value of 0 means "unlimited". (default)
 		:type maxLength: int

 		:returns: The unquoted string.
 		:rtype: str
 	"""
 	val = val \
 			.replace("&lt;", "<") \
 			.replace("&gt;", ">") \
 			.replace("&quot;", "\"") \
 			.replace("&#39;", "'")

 	if maxLength > 0:
 		return val[0:maxLength]

 	return val

 def doesEventHitWidgetOrParents(event, widget):
 	"""
 		Test if event 'event' hits widget 'widget' (or *any* of its parents)
 	"""
 	while widget:
 		if event.target == widget.element:
 			return widget

 		widget = widget.parent()

 	return None

 def doesEventHitWidgetOrChildren(event, widget):
 	"""
 		Test if event 'event' hits widget 'widget' (or *any* of its children)
 	"""
 	if event.target == widget.element:
 		return widget

 	for child in widget.children():
 		if doesEventHitWidgetOrChildren(event, child):
 			return child

 	return None

 def textToHtml(node, text):
 	"""
 	Generates html nodes from text by splitting text into content and into
 	line breaks html5.Br.

 	:param node: The node where the nodes are appended to.
 	:param text: The text to be inserted.
 	"""

 	for (i, part) in enumerate(text.split("\n")):
 		if i > 0:
 			node.appendChild(html5.Br())

 		node.appendChild(html5.TextNode(part))

 def parseInt(s, ret = 0):
 	"""
 	Parses a value as int
 	"""
 	if not isinstance(s, str):
 		return int(s)
 	elif s:
 		if s[0] in "+-":
 			ts = s[1:]
 		else:
 			ts = s

 		if ts and all([_ in "0123456789" for _ in ts]):
 			return int(s)

 	return ret

 def parseFloat(s, ret = 0.0):
 	"""
 	Parses a value as float.
 	"""
 	if not isinstance(s, str):
 		return float(s)
 	elif s:
 		if s[0] in "+-":
 			ts = s[1:]
 		else:
 			ts = s

 		if ts and ts.count(".") <= 1 and all([_ in ".0123456789" for _ in ts]):
 			return float(s)

 	return ret
--- a/docs/ide/is-loading.gif
+++ b/docs/ide/is-loading.gif
--- a/docs/ide/lark-logo.png
+++ b/docs/ide/lark-logo.png
--- a/docs/index.md
+++ b/docs/index.md
@@ -1,53 +0,0 @@
 # Lark

 A modern parsing library for Python

 ## Overview

 Lark can parse any context-free grammar.

 Lark provides:

 - Advanced grammar language, based on EBNF
 - Three parsing algorithms to choose from: Earley, LALR(1) and CYK
 - Automatic tree construction, inferred from your grammar
 - Fast unicode lexer with regexp support, and automatic line-counting

 Lark's code is hosted on Github: [https://github.com/lark-parser/lark](https://github.com/lark-parser/lark)

 ### Install
 ```bash
 $ pip install lark-parser
 ```

 #### Syntax Highlighting

 - [Sublime Text & TextMate](https://github.com/lark-parser/lark_syntax)
 - [Visual Studio Code](https://github.com/lark-parser/vscode-lark) (Or install through the vscode plugin system)
 - [Intellij & PyCharm](https://github.com/lark-parser/intellij-syntax-highlighting)

 -----

 ## Documentation Index


 * [Philosophy & Design Choices](philosophy.md)
 * [Full List of Features](features.md)
 * [Examples](https://github.com/lark-parser/lark/tree/master/examples)
 * Tutorials
    * [How to write a DSL](http://blog.erezsh.com/how-to-write-a-dsl-in-python-with-lark/) - Implements a toy LOGO-like language with an interpreter
    * [How to write a JSON parser](json_tutorial.md) - Teaches you how to use Lark
    * Unofficial
        * [Program Synthesis is Possible](https://www.cs.cornell.edu/~asampson/blog/minisynth.html) - Creates a DSL for Z3
 * Guides
    * [How to use Lark](how_to_use.md)
    * [How to develop Lark](how_to_develop.md)
 * Reference
    * [Grammar](grammar.md)
    * [Tree Construction](tree_construction.md)
    * [Visitors & Transformers](visitors.md)
    * [Classes](classes.md)
    * [Cheatsheet (PDF)](lark_cheatsheet.pdf)
 * Discussion
    * [Gitter](https://gitter.im/lark-parser/Lobby)
    * [Forum (Google Groups)](https://groups.google.com/forum/#!forum/lark-parser)
--- a/docs/index.rst
+++ b/docs/index.rst
@@ -0,0 +1,113 @@
 .. Lark documentation master file, created by
   sphinx-quickstart on Sun Aug 16 13:09:41 2020.
   You can adapt this file completely to your liking, but it should at least
   contain the root `toctree` directive.

 Welcome to Lark's documentation!
 ================================

 .. toctree::
   :maxdepth: 2
   :caption: Overview
   :hidden:

   philosophy
   features
   parsers

 .. toctree::
   :maxdepth: 2
   :caption: Tutorials & Guides
   :hidden:

   json_tutorial
   how_to_use
   how_to_develop
   recipes
   examples/index


 .. toctree::
   :maxdepth: 2
   :caption: Reference
   :hidden:

   grammar
   tree_construction
   classes
   visitors
   nearley



 Lark is a modern parsing library for Python. Lark can parse any context-free grammar.

 Lark provides:

 - Advanced grammar language, based on EBNF
 - Three parsing algorithms to choose from: Earley, LALR(1) and CYK
 - Automatic tree construction, inferred from your grammar
 - Fast unicode lexer with regexp support, and automatic line-counting


 Install Lark
 --------------

 .. code:: bash

   $ pip install lark-parser

 Syntax Highlighting
 -------------------

 -  `Sublime Text & TextMate`_
 -  `Visual Studio Code`_ (Or install through the vscode plugin system)
 -  `Intellij & PyCharm`_

 .. _Sublime Text & TextMate: https://github.com/lark-parser/lark_syntax
 .. _Visual Studio Code: https://github.com/lark-parser/vscode-lark
 .. _Intellij & PyCharm: https://github.com/lark-parser/intellij-syntax-highlighting

 Resources
 ---------

 -  :doc:`philosophy`
 -  :doc:`features`
 -  `Examples`_
 -  `Online IDE`_
 -  Tutorials

   -  `How to write a DSL`_ - Implements a toy LOGO-like language with
      an interpreter
   -  :doc:`json_tutorial` - Teaches you how to use Lark
   -  Unofficial

      -  `Program Synthesis is Possible`_ - Creates a DSL for Z3

 -  Guides

   -  :doc:`how_to_use`
   -  :doc:`how_to_develop`

 -  Reference

   -  :doc:`grammar`
   -  :doc:`tree_construction`
   -  :doc:`visitors`
   -  :doc:`classes`
   -  :doc:`nearley`
   -  `Cheatsheet (PDF)`_

 -  Discussion

   -  `Gitter`_
   -  `Forum (Google Groups)`_


 .. _Examples: https://github.com/lark-parser/lark/tree/master/examples
 .. _Online IDE: https://lark-parser.github.io/lark/ide/app.html
 .. _How to write a DSL: http://blog.erezsh.com/how-to-write-a-dsl-in-python-with-lark/
 .. _Program Synthesis is Possible: https://www.cs.cornell.edu/~asampson/blog/minisynth.html
 .. _Cheatsheet (PDF): _static/lark_cheatsheet.pdf
 .. _Gitter: https://gitter.im/lark-parser/Lobby
 .. _Forum (Google Groups): https://groups.google.com/forum/#!forum/lark-parser
--- a/docs/json_tutorial.md
+++ b/docs/json_tutorial.md
@@ -1,7 +1,6 @@
 # Lark Tutorial - JSON parser
 # JSON parser - Tutorial

 Lark is a parser - a program that accepts a grammar and text, and produces a structured tree that represents that text.

 In this tutorial we will write a JSON parser in Lark, and explore Lark's various features in the process.

 It has 5 parts.
--- a/docs/make.bat
+++ b/docs/make.bat
@@ -0,0 +1,36 @@
@ECHO OFF

 pushd %~dp0

 REM Command file for Sphinx documentation

 if "%SPHINXBUILD%" == "" (
 	set SPHINXBUILD=sphinx-build
 )
 set SOURCEDIR=.
 set BUILDDIR=_build
 set SPHINXPROJ=Lark

 if "%1" == "" goto help

 %SPHINXBUILD% >NUL 2>NUL
 if errorlevel 9009 (
 	echo.
 	echo.The 'sphinx-build' command was not found. Make sure you have Sphinx
 	echo.installed, then set the SPHINXBUILD environment variable to point
 	echo.to the full path of the 'sphinx-build' executable. Alternatively you
 	echo.may add the Sphinx directory to PATH.
 	echo.
 	echo.If you don't have Sphinx installed, grab it from
 	echo.http://sphinx-doc.org/
 	exit /b 1
 )

 %SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS%
 goto end

 :help
 %SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS%

 :end
 popd
--- a/docs/nearley.md
+++ b/docs/nearley.md
@@ -0,0 +1,47 @@
 # Importing grammars from Nearley

 Lark comes with a tool to convert grammars from [Nearley](https://github.com/Hardmath123/nearley), a popular Earley library for Javascript. It uses [Js2Py](https://github.com/PiotrDabkowski/Js2Py) to convert and run the Javascript postprocessing code segments.

 ## Requirements

 1. Install Lark with the `nearley` component:
 ```bash
 pip install lark-parser[nearley]
 ```

 2. Acquire a copy of the nearley codebase. This can be done using:
 ```bash
 git clone https://github.com/Hardmath123/nearley
 ```

 ## Usage

 Here's an example of how to import nearley's calculator example into Lark:

 ```bash
 git clone https://github.com/Hardmath123/nearley
 python -m lark.tools.nearley nearley/examples/calculator/arithmetic.ne main nearley > ncalc.py
 ```

 You can use the output as a regular python module:

 ```python
 >>> import ncalc
 >>> ncalc.parse('sin(pi/4) ^ e')
 0.38981434460254655
 ```

 The Nearley converter also supports an experimental converter for newer JavaScript (ES6+), using the `--es6` flag:

 ```bash
 git clone https://github.com/Hardmath123/nearley
 python -m lark.tools.nearley nearley/examples/calculator/arithmetic.ne main nearley --es6 > ncalc.py
 ```

 ## Notes

 - Lark currently cannot import templates from Nearley

 - Lark currently cannot export grammars to Nearley

 These might get added in the future, if enough users ask for them.
--- a/docs/parsers.md
+++ b/docs/parsers.md
@@ -1,7 +1,7 @@

 # Parsers
 Lark implements the following parsing algorithms: Earley, LALR(1), and CYK

 # Earley
 ## Earley

 An [Earley Parser](https://www.wikiwand.com/en/Earley_parser) is a chart parser capable of parsing any context-free grammar at O(n^3), and O(n^2) when the grammar is unambiguous. It can parse most LR grammars at O(n). Most programming languages are LR, and can be parsed at a linear time.

@@ -13,7 +13,7 @@ It's possible to bypass the dynamic lexing, and use the regular Earley parser wi

 Lark implements the Shared Packed Parse Forest data-structure for the Earley parser, in order to reduce the space and computation required to handle ambiguous grammars.

 You can read more about SPPF [here](http://www.bramvandersanden.com/post/2014/06/shared-packed-parse-forest/)
 You can read more about SPPF [here](https://web.archive.org/web/20191229100607/www.bramvandersanden.com/post/2014/06/shared-packed-parse-forest)

 As a result, Lark can efficiently parse and store every ambiguity in the grammar, when using Earley.

@@ -30,7 +30,7 @@ Lark provides the following options to combat ambiguity:

 **TODO: Add documentation on dynamic_complete**

 # LALR(1)
 ## LALR(1)

 [LALR(1)](https://www.wikiwand.com/en/LALR_parser) is a very efficient, true-and-tested parsing algorithm. It's incredibly fast and requires very little memory. It can parse most programming languages (For example: Python and Java).

@@ -42,7 +42,7 @@ The contextual lexer communicates with the parser, and uses the parser's lookahe

 This is an improvement to LALR(1) that is unique to Lark.

 # CYK Parser
 ## CYK Parser

 A [CYK parser](https://www.wikiwand.com/en/CYK_algorithm) can parse any context-free grammar at O(n^3*|G|).

--- a/docs/philosophy.md
+++ b/docs/philosophy.md
@@ -4,7 +4,7 @@ Parsers are innately complicated and confusing. They're difficult to understand,

 Lark's mission is to make the process of writing them as simple and abstract as possible, by following these design principles:

 ### Design Principles
 ## Design Principles

 1. Readability matters

@@ -23,7 +23,7 @@ In accordance with these principles, I arrived at the following design choices:

 -----------

 # Design Choices
 ## Design Choices

 ### 1. Separation of code and grammar

--- a/docs/recipes.md
+++ b/docs/recipes.md
@@ -139,7 +139,7 @@ If your tree nodes aren't unique (if there is a shared Tree instance), the asser

 ```python
 class Parent(Visitor):
    def visit(self, tree):
    def __default__(self, tree):
        for subtree in tree.children:
            if isinstance(subtree, Tree):
                assert not hasattr(subtree, 'parent')
--- a/docs/requirements.txt
+++ b/docs/requirements.txt
@@ -0,0 +1,2 @@
 # https://docs.readthedocs.io/en/stable/guides/specifying-dependencies.html#specifying-a-requirements-file
 sphinx-gallery
--- a/docs/tree_construction.md
+++ b/docs/tree_construction.md
@@ -1,4 +1,4 @@
 # Automatic Tree Construction - Reference
 # Tree Construction Reference


 Lark builds a tree automatically based on the structure of the grammar, where each rule that is matched becomes a branch (node) in the tree, and its children are its matches, in the order of matching.
@@ -13,7 +13,7 @@ If `maybe_placeholders=False` (the default), then `[]` behaves like `()?`.

 If `maybe_placeholders=True`, then using `[item]` will return the item if it matched, or the value `None`, if it didn't.

 ### Terminals
 ## Terminals

 Terminals are always values in the tree, never branches.

@@ -74,7 +74,7 @@ Lark will parse "((hello world))" as:
 The brackets do not appear in the tree by design. The words appear because they are matched by a named terminal.


 # Shaping the tree
 ## Shaping the tree

 Users can alter the automatic construction of the tree using a collection of grammar features.

--- a/docs/visitors.md
+++ b/docs/visitors.md
@@ -1,125 +0,0 @@
 ## Transformers & Visitors

 Transformers & Visitors provide a convenient interface to process the parse-trees that Lark returns.

 They are used by inheriting from the correct class (visitor or transformer), and implementing methods corresponding to the rule you wish to process. Each method accepts the children as an argument. That can be modified using the `v_args` decorator, which allows to inline the arguments (akin to `*args`), or add the tree `meta` property as an argument.

 See: <a href="https://github.com/lark-parser/lark/blob/master/lark/visitors.py">visitors.py</a>

 ### Visitors

 Visitors visit each node of the tree, and run the appropriate method on it according to the node's data.

 They work bottom-up, starting with the leaves and ending at the root of the tree.

 **Example:**
 ```python
 class IncreaseAllNumbers(Visitor):
  def number(self, tree):
    assert tree.data == "number"
    tree.children[0] += 1

 IncreaseAllNumbers().visit(parse_tree)
 ```

 There are two classes that implement the visitor interface:

 * Visitor - Visit every node (without recursion)

 * Visitor_Recursive - Visit every node using recursion. Slightly faster.

 ### Transformers

 Transformers visit each node of the tree, and run the appropriate method on it according to the node's data.

 They work bottom-up (or: depth-first), starting with the leaves and ending at the root of the tree.

 Transformers can be used to implement map & reduce patterns.

 Because nodes are reduced from leaf to root, at any point the callbacks may assume the children have already been transformed (if applicable).

 Transformers can be chained into a new transformer by using multiplication.

 `Transformer` can do anything `Visitor` can do, but because it reconstructs the tree, it is slightly less efficient.


 **Example:**
 ```python
 from lark import Tree, Transformer

 class EvalExpressions(Transformer):
    def expr(self, args):
            return eval(args[0])

 t = Tree('a', [Tree('expr', ['1+2'])])
 print(EvalExpressions().transform( t ))

 # Prints: Tree(a, [3])
 ```

 All these classes implement the transformer interface:

 - Transformer - Recursively transforms the tree. This is the one you probably want.
 - Transformer_InPlace - Non-recursive. Changes the tree in-place instead of returning new instances
 - Transformer_InPlaceRecursive - Recursive. Changes the tree in-place instead of returning new instances

 ### visit_tokens

 By default, transformers only visit rules. `visit_tokens=True` will tell Transformer to visit tokens as well. This is a slightly slower alternative to `lexer_callbacks`, but it's easier to maintain and works for all algorithms (even when there isn't a lexer).

 **Example:**

 ```python
 class T(Transformer):
    INT = int
    NUMBER = float
    def NAME(self, name):
        return lookup_dict.get(name, name)


 T(visit_tokens=True).transform(tree)
 ```


 ### v_args

 `v_args` is a decorator.

 By default, callback methods of transformers/visitors accept one argument: a list of the node's children. `v_args` can modify this behavior.

 When used on a transformer/visitor class definition, it applies to all the callback methods inside it.

 `v_args` accepts one of three flags:

 - `inline` - Children are provided as `*args` instead of a list argument (not recommended for very long lists).
 - `meta` - Provides two arguments: `children` and `meta` (instead of just the first)
 - `tree` - Provides the entire tree as the argument, instead of the children.

 **Examples:**

 ```python
@v_args(inline=True)
 class SolveArith(Transformer):
    def add(self, left, right):
        return left + right


 class ReverseNotation(Transformer_InPlace):
    @v_args(tree=True)
    def tree_node(self, tree):
        tree.children = tree.children[::-1]
 ```

 ### `__default__` and `__default_token__`
 These are the functions that are called on if a function with a corresponding name has not been found.

 - The `__default__` method has the signature `(data, children, meta)`, with `data` being the data attribute of the node. It defaults to reconstruct the Tree

 - The `__default_token__` just takes the `Token` as an argument. It defaults to just return the argument.


 ### Discard

 When raising the `Discard` exception in a transformer callback, that node is discarded and won't appear in the parent.


--- a/docs/visitors.rst
+++ b/docs/visitors.rst
@@ -0,0 +1,102 @@
 Transformers & Visitors
 =======================

 Transformers & Visitors provide a convenient interface to process the
 parse-trees that Lark returns.

 They are used by inheriting from the correct class (visitor or transformer),
 and implementing methods corresponding to the rule you wish to process. Each
 method accepts the children as an argument. That can be modified using the
 ``v_args`` decorator, which allows to inline the arguments (akin to ``*args``),
 or add the tree ``meta`` property as an argument.

 See: `visitors.py`_

 .. _visitors.py: https://github.com/lark-parser/lark/blob/master/lark/visitors.py

 Visitor
 -------

 Visitors visit each node of the tree, and run the appropriate method on it according to the node's data.

 They work bottom-up, starting with the leaves and ending at the root of the tree.

 There are two classes that implement the visitor interface:

 - ``Visitor``: Visit every node (without recursion)
 - ``Visitor_Recursive``: Visit every node using recursion. Slightly faster.

 Example:
    ::

        class IncreaseAllNumbers(Visitor):
        def number(self, tree):
            assert tree.data == "number"
            tree.children[0] += 1

        IncreaseAllNumbers().visit(parse_tree)

 .. autoclass:: lark.visitors.Visitor

 .. autoclass:: lark.visitors.Visitor_Recursive

 Interpreter
 -----------

 .. autoclass:: lark.visitors.Interpreter


 Example:
    ::

        class IncreaseSomeOfTheNumbers(Interpreter):
            def number(self, tree):
                tree.children[0] += 1

            def skip(self, tree):
                # skip this subtree. don't change any number node inside it.
                pass

            IncreaseSomeOfTheNumbers().visit(parse_tree)

 Transformer
 -----------

 .. autoclass:: lark.visitors.Transformer
    :members: __default__, __default_token__

 Example:
    ::

        from lark import Tree, Transformer

        class EvalExpressions(Transformer):
            def expr(self, args):
                    return eval(args[0])

        t = Tree('a', [Tree('expr', ['1+2'])])
        print(EvalExpressions().transform( t ))

        # Prints: Tree(a, [3])

 Example:
    ::

        class T(Transformer):
            INT = int
            NUMBER = float
            def NAME(self, name):
                return lookup_dict.get(name, name)

        T(visit_tokens=True).transform(tree)


 v_args
 ------

 .. autofunction:: lark.visitors.v_args

 Discard
 -------

 .. autoclass:: lark.visitors.Discard
--- a/examples/README.md
+++ b/examples/README.md
@@ -1,34 +0,0 @@
 # Examples for Lark

 #### How to run the examples

 After cloning the repo, open the terminal into the root directory of the project, and run the following:

 ```bash
 [lark]$ python -m examples.<name_of_example>
 ```

 For example, the following will parse all the Python files in the standard library of your local installation:

 ```bash
 [lark]$ python -m examples.python_parser
 ```

 ### Beginners

 - [calc.py](calc.py) - A simple example of a REPL calculator
 - [json\_parser.py](json_parser.py) - A simple JSON parser (comes with a tutorial, see docs)
 - [indented\_tree.py](indented\_tree.py) - A demonstration of parsing indentation ("whitespace significant" language)
 - [fruitflies.py](fruitflies.py) - A demonstration of ambiguity
 - [turtle\_dsl.py](turtle_dsl.py) - Implements a LOGO-like toy language for Python's turtle, with interpreter.
 - [lark\_grammar.py](lark_grammar.py) + [lark.lark](lark.lark) - A reference implementation of the Lark grammar (using LALR(1) + standard lexer)

 ### Advanced

 - [error\_reporting\_lalr.py](error_reporting_lalr.py) - A demonstration of example-driven error reporting with the LALR parser
 - [python\_parser.py](python_parser.py) - A fully-working Python 2 & 3 parser (but not production ready yet!)
 - [python\_bytecode.py](python_bytecode.py) - A toy example showing how to compile Python directly to bytecode
 - [conf\_lalr.py](conf_lalr.py) - Demonstrates the power of LALR's contextual lexer on a toy configuration language
 - [conf\_earley.py](conf_earley.py) - Demonstrates the power of Earley's dynamic lexer on a toy configuration language
 - [custom\_lexer.py](custom_lexer.py) - Demonstrates using a custom lexer to parse a non-textual stream of data
 - [reconstruct\_json.py](reconstruct_json.py) - Demonstrates the experimental text-reconstruction feature
--- a/examples/README.rst
+++ b/examples/README.rst
@@ -0,0 +1,21 @@
 Examples for Lark
 =================

 **How to run the examples**:

 After cloning the repo, open the terminal into the root directory of the
 project, and run the following:

 .. code:: bash

   [lark]$ python -m examples.<name_of_example>

 For example, the following will parse all the Python files in the
 standard library of your local installation:

 .. code:: bash

   [lark]$ python -m examples.python_parser

 Beginner Examples
 ~~~~~~~~~~~~~~~~~
--- a/examples/advanced/README.rst
+++ b/examples/advanced/README.rst
@@ -0,0 +1,2 @@
 Advanced Examples
 ~~~~~~~~~~~~~~~~~
--- a/examples/advanced/_json_parser.py
+++ b/examples/advanced/_json_parser.py
@@ -0,0 +1,64 @@
 """
 Simple JSON Parser
 ==================

 The code is short and clear, and outperforms every other parser (that's written in Python).
 For an explanation, check out the JSON parser tutorial at /docs/json_tutorial.md

 (this is here for use by the other examples)
 """
 import sys

 from lark import Lark, Transformer, v_args

 json_grammar = r"""
    ?start: value

    ?value: object
          | array
          | string
          | SIGNED_NUMBER      -> number
          | "true"             -> true
          | "false"            -> false
          | "null"             -> null

    array  : "[" [value ("," value)*] "]"
    object : "{" [pair ("," pair)*] "}"
    pair   : string ":" value

    string : ESCAPED_STRING

    %import common.ESCAPED_STRING
    %import common.SIGNED_NUMBER
    %import common.WS

    %ignore WS
 """


 class TreeToJson(Transformer):
    @v_args(inline=True)
    def string(self, s):
        return s[1:-1].replace('\\"', '"')

    array = list
    pair = tuple
    object = dict
    number = v_args(inline=True)(float)

    null = lambda self, _: None
    true = lambda self, _: True
    false = lambda self, _: False


 ### Create the JSON parser with Lark, using the LALR algorithm
 json_parser = Lark(json_grammar, parser='lalr',
                   # Using the standard lexer isn't required, and isn't usually recommended.
                   # But, it's good enough for JSON, and it's slightly faster.
                   lexer='standard',
                   # Disabling propagate_positions and placeholders slightly improves speed
                   propagate_positions=False,
                   maybe_placeholders=False,
                   # Using an internal transformer is faster and more memory efficient
                   transformer=TreeToJson())

--- a/examples/advanced/conf_earley.py
+++ b/examples/advanced/conf_earley.py
@@ -0,0 +1,44 @@
 """
 Earley’s dynamic lexer
 ======================

 Demonstrates the power of Earley’s dynamic lexer on a toy configuration language

 Using a lexer for configuration files is tricky, because values don't
 have to be surrounded by delimiters. Using a standard lexer for this just won't work.

 In this example we use a dynamic lexer and let the Earley parser resolve the ambiguity.

 Another approach is to use the contextual lexer with LALR. It is less powerful than Earley,
 but it can handle some ambiguity when lexing and it's much faster.
 See examples/conf_lalr.py for an example of that approach.

 """
 from lark import Lark

 parser = Lark(r"""
        start: _NL? section+
        section: "[" NAME "]" _NL item+
        item: NAME "=" VALUE? _NL
        VALUE: /./+

        %import common.CNAME -> NAME
        %import common.NEWLINE -> _NL
        %import common.WS_INLINE
        %ignore WS_INLINE
    """, parser="earley")

 def test():
    sample_conf = """
 [bla]

 a=Hello
 this="that",4
 empty=
 """

    r = parser.parse(sample_conf)
    print (r.pretty())

 if __name__ == '__main__':
    test()
--- a/examples/advanced/conf_lalr.py
+++ b/examples/advanced/conf_lalr.py
@@ -0,0 +1,40 @@
 """
 LALR’s contextual lexer
 =======================

 Demonstrates the power of LALR’s contextual lexer on a toy configuration language.

 The tokens NAME and VALUE match the same input. A standard lexer would arbitrarily
 choose one over the other, which would lead to a (confusing) parse error.
 However, due to the unambiguous structure of the grammar, Lark's LALR(1) algorithm knows
 which one of them to expect at each point during the parse.
 The lexer then only matches the tokens that the parser expects.
 The result is a correct parse, something that is impossible with a regular lexer.

 Another approach is to discard a lexer altogether and use the Earley algorithm.
 It will handle more cases than the contextual lexer, but at the cost of performance.
 See examples/conf_earley.py for an example of that approach.
 """
 from lark import Lark

 parser = Lark(r"""
        start: _NL? section+
        section: "[" NAME "]" _NL item+
        item: NAME "=" VALUE? _NL
        VALUE: /./+

        %import common.CNAME -> NAME
        %import common.NEWLINE -> _NL
        %import common.WS_INLINE
        %ignore WS_INLINE
    """, parser="lalr")


 sample_conf = """
 [bla]
 a=Hello
 this="that",4
 empty=
 """

 print(parser.parse(sample_conf).pretty())
--- a/examples/advanced/custom_lexer.py
+++ b/examples/advanced/custom_lexer.py
@@ -1,13 +1,14 @@
 #
 # This example demonstrates using Lark with a custom lexer.
 #
 # You can use a custom lexer to tokenize text when the lexers offered by Lark
 # are too slow, or not flexible enough.
 #
 # You can also use it (as shown in this example) to tokenize streams of objects.
 #
 """
 Custom lexer
 ============

 Demonstrates using a custom lexer to parse a non-textual stream of data

 You can use a custom lexer to tokenize text when the lexers offered by Lark
 are too slow, or not flexible enough.

 You can also use it (as shown in this example) to tokenize streams of objects.
 """
 from lark import Lark, Transformer, v_args
 from lark.lexer import Lexer, Token

--- a/examples/advanced/error_puppet.py
+++ b/examples/advanced/error_puppet.py
@@ -0,0 +1,37 @@
 """
 Error handling with a puppet
 ==================================

 This example demonstrates error handling using a parsing puppet in LALR

 When the parser encounters an UnexpectedToken exception, it creates a
 parsing puppet with the current parse-state, and lets you control how
 to proceed step-by-step. When you've achieved the correct parse-state,
 you can resume the run by returning True.
 """

 from lark import Token

 from _json_parser import json_parser

 def ignore_errors(e):
    if e.token.type == 'COMMA':
        # Skip comma
        return True
    elif e.token.type == 'SIGNED_NUMBER':
        # Try to feed a comma and retry the number
        e.puppet.feed_token(Token('COMMA', ','))
        e.puppet.feed_token(e.token)
        return True

    # Unhandled error. Will stop parse and raise exception
    return False


 def main():
    s = "[0 1, 2,, 3,,, 4, 5 6 ]"
    res = json_parser.parse(s, on_error=ignore_errors)
    print(res)      # prints [0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0]

 main()

--- a/examples/advanced/error_reporting_lalr.py
+++ b/examples/advanced/error_reporting_lalr.py
@@ -1,10 +1,13 @@
 #
 # This demonstrates example-driven error reporting with the LALR parser
 #
 """
 Example-Driven Error Reporting
 ==============================

 A demonstration of example-driven error reporting with the LALR parser

 """
 from lark import Lark, UnexpectedInput

 from .json_parser import json_grammar   # Using the grammar from the json_parser example
 from _json_parser import json_grammar   # Using the grammar from the json_parser example

 json_parser = Lark(json_grammar, parser='lalr')

@@ -52,7 +55,7 @@ def parse(json_text):
                                '[1,2,]',
                                '{"foo":1,}',
                                '{"foo":false,"bar":true,}']
        })
        }, use_accepts=True)
        if not exc_class:
            raise
        raise exc_class(u.get_context(json_text), u.line, u.column)
--- a/examples/advanced/python2.lark
+++ b/examples/advanced/python2.lark
--- a/examples/advanced/python3.lark
+++ b/examples/advanced/python3.lark
--- a/examples/advanced/python_bytecode.py
+++ b/examples/advanced/python_bytecode.py
@@ -1,12 +1,16 @@
 #
 # This is a toy example that compiles Python directly to bytecode, without generating an AST.
 # It currently only works for very very simple Python code.
 #
 # It requires the 'bytecode' library. You can get it using
 #
 #     $ pip install bytecode
 #
 """
 Compile Python to Bytecode
 ==========================
 
 A toy example that compiles Python directly to bytecode, without generating an AST.
 It currently only works for very very simple Python code.

 It requires the 'bytecode' library. You can get it using
 ::

    $ pip install bytecode

 """
 from lark import Lark, Transformer, v_args
 from lark.indenter import Indenter

--- a/examples/advanced/python_parser.py
+++ b/examples/advanced/python_parser.py
@@ -1,7 +1,11 @@
 #
 # This example demonstrates usage of the included Python grammars
 #
 """
 Grammar-complete Python Parser
 ==============================

 A fully-working Python 2 & 3 parser (but not production ready yet!)

 This example demonstrates usage of the included Python grammars
 """
 import sys
 import os, os.path
 from io import open
@@ -26,6 +30,13 @@ python_parser2 = Lark.open('python2.lark', parser='lalr', **kwargs)
 python_parser3 = Lark.open('python3.lark',parser='lalr', **kwargs)
 python_parser2_earley = Lark.open('python2.lark', parser='earley', lexer='standard', **kwargs)

 try:
    xrange
 except NameError:
    chosen_parser = python_parser3
 else:
    chosen_parser = python_parser2


 def _read(fn, *args):
    kwargs = {'encoding': 'iso-8859-1'}
@@ -42,24 +53,13 @@ def _get_lib_path():
        return [x for x in sys.path if x.endswith('%s.%s' % sys.version_info[:2])][0]

 def test_python_lib():

    path = _get_lib_path()

    start = time.time()
    files = glob.glob(path+'/*.py')
    for f in files:
        print( f )
        try:
            # print list(python_parser.lex(_read(os.path.join(path, f)) + '\n'))
            try:
                xrange
            except NameError:
                python_parser3.parse(_read(os.path.join(path, f)) + '\n')
            else:
                python_parser2.parse(_read(os.path.join(path, f)) + '\n')
        except:
            print ('At %s' % f)
            raise
        chosen_parser.parse(_read(os.path.join(path, f)) + '\n')

    end = time.time()
    print( "test_python_lib (%d files), time: %s secs"%(len(files), end-start) )
--- a/examples/advanced/qscintilla_json.py
+++ b/examples/advanced/qscintilla_json.py
@@ -1,10 +1,14 @@
 #
 # This example shows how to write a syntax-highlighted editor with Qt and Lark
 #
 # Requirements:
 #
 #   PyQt5==5.10.1
 #   QScintilla==2.10.4
 """
 Syntax Highlighting
 ===================

 This example shows how to write a syntax-highlighted editor with Qt and Lark

 Requirements:

  PyQt5==5.10.1
  QScintilla==2.10.4
 """

 import sys
 import textwrap
--- a/examples/advanced/reconstruct_json.py
+++ b/examples/advanced/reconstruct_json.py
@@ -1,16 +1,20 @@
 #
 # This example demonstrates an experimental feature: Text reconstruction
 # The Reconstructor takes a parse tree (already filtered from punctuation, of course),
 # and reconstructs it into correct text, that can be parsed correctly.
 # It can be useful for creating "hooks" to alter data before handing it to other parsers. You can also use it to generate samples from scratch.
 #
 """
 Reconstruct a JSON
 ==================

 Demonstrates the experimental text-reconstruction feature

 The Reconstructor takes a parse tree (already filtered from punctuation, of course),
 and reconstructs it into correct text, that can be parsed correctly.
 It can be useful for creating "hooks" to alter data before handing it to other parsers. You can also use it to generate samples from scratch.
 """

 import json

 from lark import Lark
 from lark.reconstruct import Reconstructor

 from .json_parser import json_grammar
 from _json_parser import json_grammar

 test_json = '''
    {
--- a/examples/advanced/template_lark.lark
+++ b/examples/advanced/template_lark.lark
--- a/examples/advanced/templates.py
+++ b/examples/advanced/templates.py
@@ -0,0 +1,29 @@
 """
 Templates
 =========

 This example shows how to use Lark's templates to achieve cleaner grammars

 """
 from lark import Lark

 grammar = r"""
 start: list | dict

 list: "[" _seperated{atom, ","} "]"
 dict: "{" _seperated{key_value, ","} "}"
 key_value: atom ":" atom

 _seperated{x, sep}: x (sep x)*  // Define a sequence of 'x sep x sep x ...'

 atom: NUMBER | ESCAPED_STRING

 %import common (NUMBER, ESCAPED_STRING, WS)
 %ignore WS
 """


 parser = Lark(grammar)

 print(parser.parse('[1, "a", 2]'))
 print(parser.parse('{"a": 2, "b": 6}'))
--- a/examples/calc.py
+++ b/examples/calc.py
@@ -1,7 +1,11 @@
 #
 # This example shows how to write a basic calculator with variables.
 #
 """
 Basic calculator
 ================

 A simple example of a REPL calculator

 This example shows how to write a basic calculator with variables.
 """
 from lark import Lark, Transformer, v_args


--- a/examples/conf_earley.py
+++ b/examples/conf_earley.py
@@ -1,42 +0,0 @@
 #
 # This example demonstrates parsing using the dynamic-lexer earley frontend
 #
 # Using a lexer for configuration files is tricky, because values don't
 # have to be surrounded by delimiters. Using a standard lexer for this just won't work.
 #
 # In this example we use a dynamic lexer and let the Earley parser resolve the ambiguity.
 #
 # Another approach is to use the contextual lexer with LALR. It is less powerful than Earley,
 # but it can handle some ambiguity when lexing and it's much faster.
 # See examples/conf_lalr.py for an example of that approach.
 #


 from lark import Lark

 parser = Lark(r"""
        start: _NL? section+
        section: "[" NAME "]" _NL item+
        item: NAME "=" VALUE? _NL
        VALUE: /./+

        %import common.CNAME -> NAME
        %import common.NEWLINE -> _NL
        %import common.WS_INLINE
        %ignore WS_INLINE
    """, parser="earley")

 def test():
    sample_conf = """
 [bla]

 a=Hello
 this="that",4
 empty=
 """

    r = parser.parse(sample_conf)
    print (r.pretty())

 if __name__ == '__main__':
    test()
--- a/examples/conf_lalr.py
+++ b/examples/conf_lalr.py
@@ -1,38 +0,0 @@
 #
 # This example demonstrates the power of the contextual lexer, by parsing a config file.
 #
 # The tokens NAME and VALUE match the same input. A standard lexer would arbitrarily
 # choose one over the other, which would lead to a (confusing) parse error.
 # However, due to the unambiguous structure of the grammar, Lark's LALR(1) algorithm knows
 # which one of them to expect at each point during the parse.
 # The lexer then only matches the tokens that the parser expects.
 # The result is a correct parse, something that is impossible with a regular lexer.
 #
 # Another approach is to discard a lexer altogether and use the Earley algorithm.
 # It will handle more cases than the contextual lexer, but at the cost of performance.
 # See examples/conf_earley.py for an example of that approach.
 #

 from lark import Lark

 parser = Lark(r"""
        start: _NL? section+
        section: "[" NAME "]" _NL item+
        item: NAME "=" VALUE? _NL
        VALUE: /./+

        %import common.CNAME -> NAME
        %import common.NEWLINE -> _NL
        %import common.WS_INLINE
        %ignore WS_INLINE
    """, parser="lalr")


 sample_conf = """
 [bla]
 a=Hello
 this="that",4
 empty=
 """

 print(parser.parse(sample_conf).pretty())
--- a/examples/fruitflies.py
+++ b/examples/fruitflies.py
@@ -1,7 +1,12 @@
 #
 # This example shows how to use get explicit ambiguity from Lark's Earley parser.
 #
 """
 Handling Ambiguity
 ==================

 A demonstration of ambiguity

 This example shows how to use get explicit ambiguity from Lark's Earley parser.

 """
 import sys
 from lark import Lark, tree

@@ -28,9 +33,13 @@ sentence = 'fruit flies like bananas'
 def make_png(filename):
    tree.pydot__tree_to_png( parser.parse(sentence), filename)

 def make_dot(filename):
    tree.pydot__tree_to_dot( parser.parse(sentence), filename)

 if __name__ == '__main__':
    print(parser.parse(sentence).pretty())
    # make_png(sys.argv[1])
    # make_dot(sys.argv[1])

 # Output:
 #
--- a/examples/indented_tree.py
+++ b/examples/indented_tree.py
@@ -1,13 +1,16 @@
 #
 # This example demonstrates usage of the Indenter class.
 #
 # Since indentation is context-sensitive, a postlex stage is introduced to
 # manufacture INDENT/DEDENT tokens.
 #
 # It is crucial for the indenter that the NL_type matches
 # the spaces (and tabs) after the newline.
 #
 """
 Parsing Indentation
 ===================

 A demonstration of parsing indentation (“whitespace significant” language)
 and the usage of the Indenter class.

 Since indentation is context-sensitive, a postlex stage is introduced to
 manufacture INDENT/DEDENT tokens.

 It is crucial for the indenter that the NL_type matches
 the spaces (and tabs) after the newline.
 """
 from lark import Lark
 from lark.indenter import Indenter

--- a/examples/json_parser.py
+++ b/examples/json_parser.py
@@ -1,10 +1,10 @@
 #
 # This example shows how to write a basic JSON parser
 #
 # The code is short and clear, and outperforms every other parser (that's written in Python).
 # For an explanation, check out the JSON parser tutorial at /docs/json_tutorial.md
 #
 """
 Simple JSON Parser
 ==================

 The code is short and clear, and outperforms every other parser (that's written in Python).
 For an explanation, check out the JSON parser tutorial at /docs/json_tutorial.md
 """
 import sys

 from lark import Lark, Transformer, v_args
--- a/examples/lark_grammar.py
+++ b/examples/lark_grammar.py
@@ -1,15 +1,25 @@
 from lark import Lark
 """
 Lark Grammar
 ============

 parser = Lark(open('examples/lark.lark'), parser="lalr")
 A reference implementation of the Lark grammar (using LALR(1))
 """
 import lark
 from pathlib import Path

 parser = lark.Lark.open('lark.lark', rel_to=__file__, parser="lalr")

 examples_path = Path(__file__).parent
 lark_path = Path(lark.__file__).parent

 grammar_files = [
    'examples/python2.lark',
    'examples/python3.lark',
    'examples/lark.lark',
    'examples/relative-imports/multiples.lark',
    'examples/relative-imports/multiple2.lark',
    'examples/relative-imports/multiple3.lark',
    'lark/grammars/common.lark',
    examples_path / 'lark.lark',
    examples_path / 'advanced/python2.lark',
    examples_path / 'advanced/python3.lark',
    examples_path / 'relative-imports/multiples.lark',
    examples_path / 'relative-imports/multiple2.lark',
    examples_path / 'relative-imports/multiple3.lark',
    lark_path / 'grammars/common.lark',
 ]

 def test():
--- a/examples/standalone/json_parser.py
+++ b/examples/standalone/json_parser.py
--- a/examples/turtle_dsl.py
+++ b/examples/turtle_dsl.py
@@ -1,4 +1,9 @@
 # This example implements a LOGO-like toy language for Python's turtle, with interpreter.
 """
 Turtle DSL
 ==========

 Implements a LOGO-like toy language for Python’s turtle, with interpreter.
 """

 try:
    input = raw_input   # For Python2 compatibility
--- a/lark-stubs/exceptions.pyi
+++ b/lark-stubs/exceptions.pyi
@@ -1,9 +1,9 @@
 # -*- coding: utf-8 -*-

 from typing import Dict, Iterable, Callable, Union
 from typing import Dict, Iterable, Callable, Union, TypeVar, Tuple, Any, List, Set
 from .tree import Tree
 from .lexer import Token

 from .parsers.lalr_puppet import ParserPuppet

 class LarkError(Exception):
    pass
@@ -21,27 +21,37 @@ class LexError(LarkError):
    pass


 T = TypeVar('T')


 class UnexpectedInput(LarkError):
    line: int
    column: int
    pos_in_stream: int
    state: Any

    def get_context(self, text: str, span: int = ...):
        ...

    def match_examples(
        self,
        parse_fn: Callable[[str], Tree],
        examples: Dict[str, Iterable[str]]
    ):
            self,
            parse_fn: Callable[[str], Tree],
            examples: Union[Dict[T, Iterable[str]], Iterable[Tuple[T, Iterable[str]]]],
            token_type_match_fallback: bool = False,
            use_accepts: bool = False,
    ) -> T:
        ...


 class UnexpectedToken(ParseError, UnexpectedInput):
    pass

    expected: Set[str]
    considered_rules: Set[str]
    puppet: ParserPuppet
    accepts: Set[str]

 class UnexpectedCharacters(LexError, UnexpectedInput):
    line: int
    column: int
    allowed: Set[str]
    considered_tokens: Set[Any]


 class VisitError(LarkError):
--- a/lark-stubs/lark.pyi
+++ b/lark-stubs/lark.pyi
@@ -23,6 +23,7 @@ class LarkOptions:
    transformer: Optional[Transformer]
    postlex: Optional[PostLex]
    ambiguity: str
    regex: bool
    debug: bool
    keep_all_tokens: bool
    propagate_positions: bool
@@ -30,10 +31,12 @@ class LarkOptions:
    lexer_callbacks: Dict[str, Callable[[Token], Token]]
    cache: Union[bool, str]
    g_regex_flags: int
    use_bytes: bool


 class Lark:
    source: str
    grammar_source: str
    options: LarkOptions
    lexer: Lexer
    terminals: List[TerminalDef]
@@ -48,12 +51,15 @@ class Lark:
        transformer: Optional[Transformer] = None,
        postlex: Optional[PostLex] = None,
        ambiguity: Literal["explicit", "resolve"] = "resolve",
        regex: bool = False,
        debug: bool = False,
        keep_all_tokens: bool = False,
        propagate_positions: bool = False,
        maybe_placeholders: bool = False,
        lexer_callbacks: Optional[Dict[str, Callable[[Token], Token]]] = None,
        g_regex_flags: int = ...
        cache: Union[bool, str] = False,
        g_regex_flags: int = ...,
        use_bytes: bool = False,
    ):
        ...

--- a/lark-stubs/lexer.pyi
+++ b/lark-stubs/lexer.pyi
@@ -1,5 +1,5 @@
 # -*- coding: utf-8 -*-

 from types import ModuleType
 from typing import (
    TypeVar, Type, Tuple, List, Dict, Iterator, Collection, Callable, Optional,
    Pattern as REPattern,
@@ -107,10 +107,12 @@ class TraditionalLexer(Lexer):
    user_callbacks: Dict[str, _Callback]
    callback: Dict[str, _Callback]
    mres: List[Tuple[REPattern, Dict[int, str]]]
    re: ModuleType

    def __init__(
        self,
        terminals: Collection[TerminalDef],
        re_: ModuleType,
        ignore: Collection[str] = ...,
        user_callbacks: Dict[str, _Callback] = ...,
        g_regex_flags: int = ...
@@ -135,6 +137,7 @@ class ContextualLexer(Lexer):
        self,
        terminals: Collection[TerminalDef],
        states: Dict[str, Collection[str]],
        re_: ModuleType,
        ignore: Collection[str] = ...,
        always_accept: Collection[str] = ...,
        user_callbacks: Dict[str, _Callback] = ...,
--- a/lark-stubs/parsers/init.pyi
+++ b/lark-stubs/parsers/init.pyi
--- a/lark-stubs/parsers/lalr_puppet.pyi
+++ b/lark-stubs/parsers/lalr_puppet.pyi
@@ -0,0 +1,22 @@
 from typing import Set, Dict, Any

 from lark import Token, Tree


 class ParserPuppet(object):
    """
    Provides an interface to interactively step through the parser (LALR(1) only for now)

    Accessible via `UnexpectedToken.puppet` (raised by the parser on token error)
    """
    def feed_token(self, token: Token): ...

    def copy(self) -> ParserPuppet: ...

    def pretty(self) -> str: ...

    def choices(self) -> Dict[str, Any]: ...

    def accepts(self) -> Set[str]: ...

    def resume_parse(self) -> Tree: ...
--- a/lark-stubs/reconstruct.pyi
+++ b/lark-stubs/reconstruct.pyi
@@ -30,7 +30,7 @@ class MakeMatchTree:

 class Reconstructor:

    def __init__(self, parser: Lark):
    def __init__(self, parser: Lark, term_subs: Dict[str, str] = ...):
        ...

    def reconstruct(self, tree: Tree) -> str:
--- a/lark/init.py
+++ b/lark/init.py
@@ -1,3 +1,4 @@
 from .utils import logger
 from .tree import Tree
 from .visitors import Transformer, Visitor, v_args, Discard
 from .visitors import InlineTransformer, inline_args   # XXX Deprecated
@@ -6,4 +7,4 @@ from .exceptions import (ParseError, LexError, GrammarError, UnexpectedToken,
 from .lexer import Token
 from .lark import Lark

 __version__ = "0.8.5"
 __version__ = "0.9.0"
--- a/lark/common.py
+++ b/lark/common.py
@@ -4,18 +4,18 @@ from .lexer import TerminalDef
 ###{standalone

 class LexerConf(Serialize):
    __serialize_fields__ = 'tokens', 'ignore', 'g_regex_flags'
    __serialize_fields__ = 'tokens', 'ignore', 'g_regex_flags', 'use_bytes'
    __serialize_namespace__ = TerminalDef,

    def __init__(self, tokens, ignore=(), postlex=None, callbacks=None, g_regex_flags=0):
        self.tokens = tokens
    def __init__(self, tokens, re_module, ignore=(), postlex=None, callbacks=None, g_regex_flags=0, skip_validation=False, use_bytes=False):
        self.tokens = tokens    # TODO should be terminals
        self.ignore = ignore
        self.postlex = postlex
        self.callbacks = callbacks or {}
        self.g_regex_flags = g_regex_flags

    def _deserialize(self):
        self.callbacks = {} # TODO
        self.re_module = re_module
        self.skip_validation = skip_validation
        self.use_bytes = use_bytes

 ###}

--- a/lark/exceptions.py
+++ b/lark/exceptions.py
@@ -1,6 +1,8 @@
 from .utils import STRING_TYPE
 from .utils import STRING_TYPE, logger

 ###{standalone


 class LarkError(Exception):
    pass

@@ -22,54 +24,109 @@ class UnexpectedEOF(ParseError):


 class UnexpectedInput(LarkError):
    """UnexpectedInput Error.

    Used as a base class for the following exceptions:

    - ``UnexpectedToken``: The parser recieved an unexpected token
    - ``UnexpectedCharacters``: The lexer encountered an unexpected string

    After catching one of these exceptions, you may call the following helper methods to create a nicer error message.
    """
    pos_in_stream = None

    def get_context(self, text, span=40):
        """Returns a pretty string pinpointing the error in the text,
        with span amount of context characters around it.

        Note:
            The parser doesn't hold a copy of the text it has to parse,
            so you have to provide it again
        """
        pos = self.pos_in_stream
        start = max(pos - span, 0)
        end = pos + span
        before = text[start:pos].rsplit('\n', 1)[-1]
        after = text[pos:end].split('\n', 1)[0]
        return before + after + '\n' + ' ' * len(before) + '^\n'

    def match_examples(self, parse_fn, examples):
        """ Given a parser instance and a dictionary mapping some label with
            some malformed syntax examples, it'll return the label for the
            example that bests matches the current error.
        if not isinstance(text, bytes):
            before = text[start:pos].rsplit('\n', 1)[-1]
            after = text[pos:end].split('\n', 1)[0]
            return before + after + '\n' + ' ' * len(before.expandtabs()) + '^\n'
        else:
            before = text[start:pos].rsplit(b'\n', 1)[-1]
            after = text[pos:end].split(b'\n', 1)[0]
            return (before + after + b'\n' + b' ' * len(before.expandtabs()) + b'^\n').decode("ascii", "backslashreplace")

    def match_examples(self, parse_fn, examples, token_type_match_fallback=False, use_accepts=False):
        """Allows you to detect what's wrong in the input text by matching
        against example errors.

        Given a parser instance and a dictionary mapping some label with
        some malformed syntax examples, it'll return the label for the
        example that bests matches the current error. The function will
        iterate the dictionary until it finds a matching error, and
        return the corresponding value.

        For an example usage, see `examples/error_reporting_lalr.py`

        Parameters:
            parse_fn: parse function (usually ``lark_instance.parse``)
            examples: dictionary of ``{'example_string': value}``.
            use_accepts: Recommended to call this with ``use_accepts=True``.
                The default is ``False`` for backwards compatibility.
        """
        assert self.state is not None, "Not supported for this exception"

        candidate = None
        for label, example in examples.items():
        if isinstance(examples, dict):
            examples = examples.items()

        candidate = (None, False)
        for i, (label, example) in enumerate(examples):
            assert not isinstance(example, STRING_TYPE)

            for malformed in example:
            for j, malformed in enumerate(example):
                try:
                    parse_fn(malformed)
                except UnexpectedInput as ut:
                    if ut.state == self.state:
                        if use_accepts and ut.accepts != self.accepts:
                            logger.debug("Different accepts with same state[%d]: %s != %s at example [%s][%s]" %
                                        (self.state, self.accepts, ut.accepts, i, j))
                            continue
                        try:
                            if ut.token == self.token:  # Try exact match first
                                logger.debug("Exact Match at example [%s][%s]" % (i, j))
                                return label

                            if token_type_match_fallback:
                                # Fallback to token types match
                                if (ut.token.type == self.token.type) and not candidate[-1]:
                                    logger.debug("Token Type Fallback at example [%s][%s]" % (i, j))
                                    candidate = label, True

                        except AttributeError:
                            pass
                        if not candidate:
                            candidate = label
                        if not candidate[0]:
                            logger.debug("Same State match at example [%s][%s]" % (i, j))
                            candidate = label, False

        return candidate
        return candidate[0]


 class UnexpectedCharacters(LexError, UnexpectedInput):
    def __init__(self, seq, lex_pos, line, column, allowed=None, considered_tokens=None, state=None, token_history=None):
        message = "No terminal defined for '%s' at line %d col %d" % (seq[lex_pos], line, column)

        self.line = line
        self.column = column
        self.allowed = allowed
        self.considered_tokens = considered_tokens
        self.pos_in_stream = lex_pos
        self.state = state

        self.allowed = allowed
        self.considered_tokens = considered_tokens

        if isinstance(seq, bytes):
            _s = seq[lex_pos:lex_pos+1].decode("ascii", "backslashreplace")
        else:
            _s = seq[lex_pos]

        message = "No terminal defined for '%s' at line %d col %d" % (_s, line, column)
        message += '\n\n' + self.get_context(seq)
        if allowed:
            message += '\nExpecting: %s\n' % allowed
@@ -79,23 +136,35 @@ class UnexpectedCharacters(LexError, UnexpectedInput):
        super(UnexpectedCharacters, self).__init__(message)



 class UnexpectedToken(ParseError, UnexpectedInput):
    def __init__(self, token, expected, considered_rules=None, state=None):
        self.token = token
        self.expected = expected     # XXX str shouldn't necessary
    """When the parser throws UnexpectedToken, it instanciates a puppet
    with its internal state. Users can then interactively set the puppet to
    the desired puppet state, and resume regular parsing.

    see: :ref:`ParserPuppet`.
    """
    def __init__(self, token, expected, considered_rules=None, state=None, puppet=None):
        self.line = getattr(token, 'line', '?')
        self.column = getattr(token, 'column', '?')
        self.considered_rules = considered_rules
        self.state = state
        self.pos_in_stream = getattr(token, 'pos_in_stream', None)
        self.state = state

        self.token = token
        self.expected = expected     # XXX deprecate? `accepts` is better
        self.considered_rules = considered_rules
        self.puppet = puppet

        # TODO Only calculate `accepts()` when we need to display it to the user
        # This will improve performance when doing automatic error handling
        self.accepts = puppet and puppet.accepts()

        message = ("Unexpected token %r at line %s, column %s.\n"
                   "Expected one of: \n\t* %s\n"
                   % (token, self.line, self.column, '\n\t* '.join(self.expected)))
                   % (token, self.line, self.column, '\n\t* '.join(self.accepts or self.expected)))

        super(UnexpectedToken, self).__init__(message)


 class VisitError(LarkError):
    """VisitError is raised when visitors are interrupted by an exception

--- a/lark/lark.py
+++ b/lark/lark.py
@@ -1,19 +1,25 @@
 from __future__ import absolute_import

 import sys, os, pickle, hashlib, logging
 import sys, os, pickle, hashlib
 from io import open


 from .utils import STRING_TYPE, Serialize, SerializeMemoizer, FS
 from .utils import STRING_TYPE, Serialize, SerializeMemoizer, FS, isascii, logger
 from .load_grammar import load_grammar
 from .tree import Tree
 from .common import LexerConf, ParserConf

 from .lexer import Lexer, TraditionalLexer, TerminalDef
 from .lexer import Lexer, TraditionalLexer, TerminalDef, UnexpectedToken
 from .parse_tree_builder import ParseTreeBuilder
 from .parser_frontends import get_frontend
 from .parser_frontends import get_frontend, _get_lexer_callbacks
 from .grammar import Rule

 import re
 try:
    import regex
 except ImportError:
    regex = None

 ###{standalone

 class LarkOptions(Serialize):
@@ -21,61 +27,69 @@ class LarkOptions(Serialize):

    """
    OPTIONS_DOC = """
 # General

    start - The start symbol. Either a string, or a list of strings for
            multiple possible starts (Default: "start")
    debug - Display debug information, such as warnings (default: False)
    transformer - Applies the transformer to every parse tree (equivlent to
                  applying it after the parse, but faster)
    propagate_positions - Propagates (line, column, end_line, end_column)
                          attributes into all tree branches.
    maybe_placeholders - When True, the `[]` operator returns `None` when not matched.
                         When `False`,  `[]` behaves like the `?` operator,
                             and returns no value at all.
                         (default=`False`. Recommended to set to `True`)
    cache - Cache the results of the Lark grammar analysis, for x2 to x3 faster loading.
            LALR only for now.
        When `False`, does nothing (default)
        When `True`, caches to a temporary file in the local directory
        When given a string, caches to the path pointed by the string

    g_regex_flags - Flags that are applied to all terminals
                    (both regex and strings)
    keep_all_tokens - Prevent the tree builder from automagically
                      removing "punctuation" tokens (default: False)

 # Algorithm

    parser - Decides which parser engine to use
             Accepts "earley" or "lalr". (Default: "earley")
             (there is also a "cyk" option for legacy)

    lexer - Decides whether or not to use a lexer stage
        "auto" (default): Choose for me based on the parser
        "standard": Use a standard lexer
        "contextual": Stronger lexer (only works with parser="lalr")
        "dynamic": Flexible and powerful (only with parser="earley")
        "dynamic_complete": Same as dynamic, but tries *every* variation
                            of tokenizing possible.

    ambiguity - Decides how to handle ambiguity in the parse.
                Only relevant if parser="earley"
        "resolve": The parser will automatically choose the simplest
                    derivation (it chooses consistently: greedy for
                    tokens, non-greedy for rules)
        "explicit": The parser will return all derivations wrapped
                    in "_ambig" tree nodes (i.e. a forest).

 # Domain Specific

    postlex - Lexer post-processing (Default: None) Only works with the
                standard and contextual lexers.
    priority - How priorities should be evaluated - auto, none, normal,
                invert (Default: auto)
    lexer_callbacks - Dictionary of callbacks for the lexer. May alter
                        tokens during lexing. Use with caution.
    edit_terminals - A callback
    **===  General Options  ===**

    start
            The start symbol. Either a string, or a list of strings for multiple possible starts (Default: "start")
    debug
            Display debug information, such as warnings (default: False)
    transformer
            Applies the transformer to every parse tree (equivlent to applying it after the parse, but faster)
    propagate_positions
            Propagates (line, column, end_line, end_column) attributes into all tree branches.
    maybe_placeholders
            When True, the ``[]`` operator returns ``None`` when not matched.

            When ``False``,  ``[]`` behaves like the ``?`` operator, and returns no value at all.
            (default= ``False``. Recommended to set to ``True``)
    regex
            When True, uses the ``regex`` module instead of the stdlib ``re``.
    cache
            Cache the results of the Lark grammar analysis, for x2 to x3 faster loading. LALR only for now.

            - When ``False``, does nothing (default)
            - When ``True``, caches to a temporary file in the local directory
            - When given a string, caches to the path pointed by the string

    g_regex_flags
            Flags that are applied to all terminals (both regex and strings)
    keep_all_tokens
            Prevent the tree builder from automagically removing "punctuation" tokens (default: False)

    **=== Algorithm Options ===**

    parser
            Decides which parser engine to use. Accepts "earley" or "lalr". (Default: "earley").
            (there is also a "cyk" option for legacy)
    lexer
            Decides whether or not to use a lexer stage

            - "auto" (default): Choose for me based on the parser
            - "standard": Use a standard lexer
            - "contextual": Stronger lexer (only works with parser="lalr")
            - "dynamic": Flexible and powerful (only with parser="earley")
            - "dynamic_complete": Same as dynamic, but tries *every* variation of tokenizing possible.
    ambiguity
            Decides how to handle ambiguity in the parse. Only relevant if parser="earley"

            - "resolve" - The parser will automatically choose the simplest derivation
                        (it chooses consistently: greedy for tokens, non-greedy for rules)
            - "explicit": The parser will return all derivations wrapped in "_ambig" tree nodes (i.e. a forest).

    **=== Misc. / Domain Specific Options ===**

    postlex
            Lexer post-processing (Default: None) Only works with the standard and contextual lexers.
    priority
            How priorities should be evaluated - auto, none, normal, invert (Default: auto)
    lexer_callbacks
            Dictionary of callbacks for the lexer. May alter tokens during lexing. Use with caution.
    use_bytes
            Accept an input of type ``bytes`` instead of ``str`` (Python 3 only).
    edit_terminals
            A callback for editing the terminals before parse.

    **=== End Options ===**
    """
    if __doc__:
        __doc__ += OPTIONS_DOC
@@ -92,11 +106,13 @@ class LarkOptions(Serialize):
        'start': 'start',
        'priority': 'auto',
        'ambiguity': 'auto',
        'regex': False,
        'propagate_positions': False,
        'lexer_callbacks': {},
        'maybe_placeholders': False,
        'edit_terminals': None,
        'g_regex_flags': 0,
        'use_bytes': False,
    }

    def __init__(self, options_dict):
@@ -106,7 +122,7 @@ class LarkOptions(Serialize):
        for name, default in self._defaults.items():
            if name in o:
                value = o.pop(name)
                if isinstance(default, bool) and name != 'cache':
                if isinstance(default, bool) and name not in ('cache', 'use_bytes'):
                    value = bool(value)
            else:
                value = default
@@ -146,14 +162,31 @@ class LarkOptions(Serialize):


 class Lark(Serialize):
    def __init__(self, grammar, **options):
        """
            grammar : a string or file-object containing the grammar spec (using Lark's ebnf syntax)
            options : a dictionary controlling various aspects of Lark.
        """
    """Main interface for the library.

    It's mostly a thin wrapper for the many different parsers, and for the tree constructor.

    Parameters:
        grammar: a string or file-object containing the grammar spec (using Lark's ebnf syntax)
        options: a dictionary controlling various aspects of Lark.

    Example:
        >>> Lark(r'''start: "foo" ''')
        Lark(...)
    """
    def __init__(self, grammar, **options):
        self.options = LarkOptions(options)

        # Set regex or re module
        use_regex = self.options.regex
        if use_regex:
            if regex:
                re_module = regex
            else:
                raise ImportError('`regex` module must be installed if calling `Lark(regex=True)`.')
        else:
            re_module = re

        # Some, but not all file-like objects have a 'name' attribute
        try:
            self.source = grammar.name
@@ -169,6 +202,13 @@ class Lark(Serialize):
            grammar = read()

        assert isinstance(grammar, STRING_TYPE)
        self.grammar_source = grammar
        if self.options.use_bytes:
            if not isascii(grammar):
                raise ValueError("Grammar must be ascii only, when use_bytes=True")
            if sys.version_info[0] == 2 and self.options.use_bytes != 'force':
                raise NotImplementedError("`use_bytes=True` may have issues on python2."
                                          "Use `use_bytes='force'` to use it at your own risk.")

        cache_fn = None
        if self.options.cache:
@@ -178,15 +218,16 @@ class Lark(Serialize):
                cache_fn = self.options.cache
            else:
                if self.options.cache is not True:
                    raise ValueError("cache must be bool or str")
                    raise ValueError("cache argument must be bool or str")
                unhashable = ('transformer', 'postlex', 'lexer_callbacks', 'edit_terminals')
                from . import __version__
                options_str = ''.join(k+str(v) for k, v in options.items() if k not in unhashable)
                s = grammar + options_str
                s = grammar + options_str + __version__
                md5 = hashlib.md5(s.encode()).hexdigest()
                cache_fn = '.lark_cache_%s.tmp' % md5

            if FS.exists(cache_fn):
                logging.debug('Loading grammar from cache: %s', cache_fn)
                logger.debug('Loading grammar from cache: %s', cache_fn)
                with FS.open(cache_fn, 'rb') as f:
                    self._load(f, self.options.transformer, self.options.postlex)
                return
@@ -224,7 +265,7 @@ class Lark(Serialize):
        assert self.options.ambiguity in ('resolve', 'explicit', 'auto', )

        # Parse the grammar file and compose the grammars (TODO)
        self.grammar = load_grammar(grammar, self.source)
        self.grammar = load_grammar(grammar, self.source, re_module)

        # Compile the EBNF grammar into BNF
        self.terminals, self.rules, self.ignore_tokens = self.grammar.compile(self.options.start)
@@ -233,7 +274,7 @@ class Lark(Serialize):
            for t in self.terminals:
                self.options.edit_terminals(t)

        self._terminals_dict = {t.name:t for t in self.terminals}
        self._terminals_dict = {t.name: t for t in self.terminals}

        # If the user asked to invert the priorities, negate them all here.
        # This replaces the old 'resolve__antiscore_sum' option.
@@ -250,14 +291,12 @@ class Lark(Serialize):
                    rule.options.priority = None

        # TODO Deprecate lexer_callbacks?
        lexer_callbacks = dict(self.options.lexer_callbacks)
        if self.options.transformer:
            t = self.options.transformer
            for term in self.terminals:
                if hasattr(t, term.name):
                    lexer_callbacks[term.name] = getattr(t, term.name)
        lexer_callbacks = (_get_lexer_callbacks(self.options.transformer, self.terminals)
                           if self.options.transformer
                           else {})
        lexer_callbacks.update(self.options.lexer_callbacks)

        self.lexer_conf = LexerConf(self.terminals, self.ignore_tokens, self.options.postlex, lexer_callbacks, self.options.g_regex_flags)
        self.lexer_conf = LexerConf(self.terminals, re_module, self.ignore_tokens, self.options.postlex, lexer_callbacks, self.options.g_regex_flags, use_bytes=self.options.use_bytes)

        if self.options.parser:
            self.parser = self._build_parser()
@@ -265,17 +304,16 @@ class Lark(Serialize):
            self.lexer = self._build_lexer()

        if cache_fn:
            logging.debug('Saving grammar to cache: %s', cache_fn)
            logger.debug('Saving grammar to cache: %s', cache_fn)
            with FS.open(cache_fn, 'wb') as f:
                self.save(f)

    if __init__.__doc__:
        __init__.__doc__ += "\nOptions:\n" + LarkOptions.OPTIONS_DOC
    __doc__ += "\n\n" + LarkOptions.OPTIONS_DOC

    __serialize_fields__ = 'parser', 'rules', 'options'

    def _build_lexer(self):
        return TraditionalLexer(self.lexer_conf.tokens, ignore=self.lexer_conf.ignore, user_callbacks=self.lexer_conf.callbacks, g_regex_flags=self.lexer_conf.g_regex_flags)
        return TraditionalLexer(self.lexer_conf)

    def _prepare_callbacks(self):
        self.parser_class = get_frontend(self.options.parser, self.options.lexer)
@@ -288,11 +326,19 @@ class Lark(Serialize):
        return self.parser_class(self.lexer_conf, parser_conf, options=self.options)

    def save(self, f):
        """Saves the instance into the given file object

        Useful for caching and multiprocessing.
        """
        data, m = self.memo_serialize([TerminalDef, Rule])
        pickle.dump({'data': data, 'memo': m}, f)

    @classmethod
    def load(cls, f):
        """Loads an instance from the given file object

        Useful for caching and multiprocessing.
        """
        inst = cls.__new__(cls)
        return inst._load(f)

@@ -312,10 +358,18 @@ class Lark(Serialize):
        if postlex is not None:
            options['postlex'] = postlex
        self.options = LarkOptions.deserialize(options, memo)
        re_module = regex if self.options.regex else re
        self.rules = [Rule.deserialize(r, memo) for r in data['rules']]
        self.source = '<deserialized>'
        self._prepare_callbacks()
        self.parser = self.parser_class.deserialize(data['parser'], memo, self._callbacks, self.options.postlex)
        self.parser = self.parser_class.deserialize(
            data['parser'],
            memo,
            self._callbacks,
            self.options.postlex,
            self.options.transformer,
            re_module
        )
        return self

    @classmethod
@@ -327,7 +381,7 @@ class Lark(Serialize):
    def open(cls, grammar_filename, rel_to=None, **options):
        """Create an instance of Lark with the grammar given by its filename

        If rel_to is provided, the function will find the grammar filename in relation to it.
        If ``rel_to`` is provided, the function will find the grammar filename in relation to it.

        Example:

@@ -358,13 +412,37 @@ class Lark(Serialize):
        "Get information about a terminal"
        return self._terminals_dict[name]

    def parse(self, text, start=None):
    def parse(self, text, start=None, on_error=None):
        """Parse the given text, according to the options provided.

        The 'start' parameter is required if Lark was given multiple possible start symbols (using the start option).
        Parameters:
            text (str): Text to be parsed.
            start (str, optional): Required if Lark was given multiple possible start symbols (using the start option).
            on_error (function, optional): if provided, will be called on UnexpectedToken error. Return true to resume parsing.
                LALR only. See examples/error_puppet.py for an example of how to use on_error.

        Returns:
            If a transformer is supplied to ``__init__``, returns whatever is the
            result of the transformation. Otherwise, returns a Tree instance.

        Returns a tree, unless specified otherwise.
        """
        return self.parser.parse(text, start=start)

        try:
            return self.parser.parse(text, start=start)
        except UnexpectedToken as e:
            if on_error is None:
                raise

            while True:
                if not on_error(e):
                    raise e
                try:
                    return e.puppet.resume_parse()
                except UnexpectedToken as e2:
                    if e.token.type == e2.token.type == '$END' and e.puppet == e2.puppet:
                        # Prevent infinite loop
                        raise e2
                    e = e2


 ###}
--- a/lark/lexer.py
+++ b/lark/lexer.py
@@ -6,6 +6,7 @@ from .utils import Str, classify, get_regexp_width, Py36, Serialize
 from .exceptions import UnexpectedCharacters, LexError, UnexpectedToken

 ###{standalone
 from copy import copy

 class Pattern(Serialize):

@@ -88,8 +89,25 @@ class TerminalDef(Serialize):
        return '%s(%r, %r)' % (type(self).__name__, self.name, self.pattern)



 class Token(Str):
    """A string with meta-information, that is produced by the lexer.

    When parsing text, the resulting chunks of the input that haven't been discarded,
    will end up in the tree as Token instances. The Token class inherits from Python's ``str``,
    so normal string comparisons and operations will work as expected.

    Attributes:
        type: Name of the token (as specified in grammar)
        value: Value of the token (redundant, as ``token.value == token`` will always be true)
        pos_in_stream: The index of the token in the text
        line: The line of the token in the text (starting with 1)
        column: The column of the token in the text (starting with 1)
        end_line: The line where the token ends
        end_column: The next column after the end of the token. For example,
            if the token is a single character with a column value of 4,
            end_column will be 5.
        end_pos: the index where the token ends (basically ``pos_in_stream + len(token)``)
    """
    __slots__ = ('type', 'pos_in_stream', 'value', 'line', 'column', 'end_line', 'end_column', 'end_pos')

    def __new__(cls, type_, value, pos_in_stream=None, line=None, column=None, end_line=None, end_column=None, end_pos=None):
@@ -139,8 +157,8 @@ class Token(Str):


 class LineCounter:
    def __init__(self):
        self.newline_char = '\n'
    def __init__(self, newline_char):
        self.newline_char = newline_char
        self.char_pos = 0
        self.line = 1
        self.column = 1
@@ -169,7 +187,7 @@ class _Lex:
    def lex(self, stream, newline_types, ignore_types):
        newline_types = frozenset(newline_types)
        ignore_types = frozenset(ignore_types)
        line_ctr = LineCounter()
        line_ctr = LineCounter('\n' if not self.lexer.use_bytes else b'\n')
        last_token = None

        while line_ctr.char_pos < len(stream):
@@ -230,7 +248,7 @@ class CallChain:



 def _create_unless(terminals, g_regex_flags):
 def _create_unless(terminals, g_regex_flags, re_, use_bytes):
    tokens_by_type = classify(terminals, lambda t: type(t.pattern))
    assert len(tokens_by_type) <= 2, tokens_by_type.keys()
    embedded_strs = set()
@@ -241,37 +259,40 @@ def _create_unless(terminals, g_regex_flags):
            if strtok.priority > retok.priority:
                continue
            s = strtok.pattern.value
            m = re.match(retok.pattern.to_regexp(), s, g_regex_flags)
            m = re_.match(retok.pattern.to_regexp(), s, g_regex_flags)
            if m and m.group(0) == s:
                unless.append(strtok)
                if strtok.pattern.flags <= retok.pattern.flags:
                    embedded_strs.add(strtok)
        if unless:
            callback[retok.name] = UnlessCallback(build_mres(unless, g_regex_flags, match_whole=True))
            callback[retok.name] = UnlessCallback(build_mres(unless, g_regex_flags, re_, match_whole=True, use_bytes=use_bytes))

    terminals = [t for t in terminals if t not in embedded_strs]
    return terminals, callback


 def _build_mres(terminals, max_size, g_regex_flags, match_whole):
 def _build_mres(terminals, max_size, g_regex_flags, match_whole, re_, use_bytes):
    # Python sets an unreasonable group limit (currently 100) in its re module
    # Worse, the only way to know we reached it is by catching an AssertionError!
    # This function recursively tries less and less groups until it's successful.
    postfix = '$' if match_whole else ''
    mres = []
    while terminals:
        pattern = u'|'.join(u'(?P<%s>%s)' % (t.name, t.pattern.to_regexp() + postfix) for t in terminals[:max_size])
        if use_bytes:
            pattern = pattern.encode('latin-1')
        try:
            mre = re.compile(u'|'.join(u'(?P<%s>%s)'%(t.name, t.pattern.to_regexp()+postfix) for t in terminals[:max_size]), g_regex_flags)
            mre = re_.compile(pattern, g_regex_flags)
        except AssertionError:  # Yes, this is what Python provides us.. :/
            return _build_mres(terminals, max_size//2, g_regex_flags, match_whole)
            return _build_mres(terminals, max_size//2, g_regex_flags, match_whole, re_, use_bytes)

        # terms_from_name = {t.name: t for t in terminals[:max_size]}
        mres.append((mre, {i:n for n,i in mre.groupindex.items()} ))
        terminals = terminals[max_size:]
    return mres

 def build_mres(terminals, g_regex_flags, match_whole=False):
    return _build_mres(terminals, len(terminals), g_regex_flags, match_whole)
 def build_mres(terminals, g_regex_flags, re_, use_bytes, match_whole=False):
    return _build_mres(terminals, len(terminals), g_regex_flags, match_whole, re_, use_bytes)

 def _regexp_has_newline(r):
    r"""Expressions that may indicate newlines in a regexp:
@@ -294,34 +315,40 @@ class Lexer(object):

 class TraditionalLexer(Lexer):

    def __init__(self, terminals, ignore=(), user_callbacks={}, g_regex_flags=0):
    def __init__(self, conf):
        terminals = list(conf.tokens)
        assert all(isinstance(t, TerminalDef) for t in terminals), terminals

        terminals = list(terminals)
        self.re = conf.re_module

        # Sanitization
        for t in terminals:
            try:
                re.compile(t.pattern.to_regexp(), g_regex_flags)
            except re.error:
                raise LexError("Cannot compile token %s: %s" % (t.name, t.pattern))
        if not conf.skip_validation:
            # Sanitization
            for t in terminals:
                try:
                    self.re.compile(t.pattern.to_regexp(), conf.g_regex_flags)
                except self.re.error:
                    raise LexError("Cannot compile token %s: %s" % (t.name, t.pattern))

            if t.pattern.min_width == 0:
                raise LexError("Lexer does not allow zero-width terminals. (%s: %s)" % (t.name, t.pattern))
                if t.pattern.min_width == 0:
                    raise LexError("Lexer does not allow zero-width terminals. (%s: %s)" % (t.name, t.pattern))

        assert set(ignore) <= {t.name for t in terminals}
            assert set(conf.ignore) <= {t.name for t in terminals}

        # Init
        self.newline_types = [t.name for t in terminals if _regexp_has_newline(t.pattern.to_regexp())]
        self.ignore_types = list(ignore)
        self.ignore_types = list(conf.ignore)

        terminals.sort(key=lambda x:(-x.priority, -x.pattern.max_width, -len(x.pattern.value), x.name))
        self.terminals = terminals
        self.user_callbacks = user_callbacks
        self.build(g_regex_flags)
        self.user_callbacks = conf.callbacks
        self.g_regex_flags = conf.g_regex_flags
        self.use_bytes = conf.use_bytes

    def build(self, g_regex_flags=0):
        terminals, self.callback = _create_unless(self.terminals, g_regex_flags)
        self._mres = None
        # self.build(g_regex_flags)

    def _build(self):
        terminals, self.callback = _create_unless(self.terminals, self.g_regex_flags, re_=self.re, use_bytes=self.use_bytes)
        assert all(self.callback.values())

        for type_, f in self.user_callbacks.items():
@@ -331,7 +358,13 @@ class TraditionalLexer(Lexer):
            else:
                self.callback[type_] = f

        self.mres = build_mres(terminals, g_regex_flags)
        self._mres = build_mres(terminals, self.g_regex_flags, self.re, self.use_bytes)

    @property
    def mres(self):
        if self._mres is None:
            self._build()
        return self._mres

    def match(self, stream, pos):
        for mre, type_from_index in self.mres:
@@ -347,12 +380,16 @@ class TraditionalLexer(Lexer):

 class ContextualLexer(Lexer):

    def __init__(self, terminals, states, ignore=(), always_accept=(), user_callbacks={}, g_regex_flags=0):
    def __init__(self, conf, states, always_accept=()):
        terminals = list(conf.tokens)
        tokens_by_name = {}
        for t in terminals:
            assert t.name not in tokens_by_name, t
            tokens_by_name[t.name] = t

        trad_conf = copy(conf)
        trad_conf.tokens = terminals

        lexer_by_tokens = {}
        self.lexers = {}
        for state, accepts in states.items():
@@ -360,14 +397,17 @@ class ContextualLexer(Lexer):
            try:
                lexer = lexer_by_tokens[key]
            except KeyError:
                accepts = set(accepts) | set(ignore) | set(always_accept)
                accepts = set(accepts) | set(conf.ignore) | set(always_accept)
                state_tokens = [tokens_by_name[n] for n in accepts if n and n in tokens_by_name]
                lexer = TraditionalLexer(state_tokens, ignore=ignore, user_callbacks=user_callbacks, g_regex_flags=g_regex_flags)
                lexer_conf = copy(trad_conf)
                lexer_conf.tokens = state_tokens
                lexer = TraditionalLexer(lexer_conf)
                lexer_by_tokens[key] = lexer

            self.lexers[state] = lexer

        self.root_lexer = TraditionalLexer(terminals, ignore=ignore, user_callbacks=user_callbacks, g_regex_flags=g_regex_flags)
        assert trad_conf.tokens is terminals
        self.root_lexer = TraditionalLexer(trad_conf)

    def lex(self, stream, get_parser_state):
        parser_state = get_parser_state()
--- a/lark/load_grammar.py
+++ b/lark/load_grammar.py
@@ -5,7 +5,7 @@ import sys
 from copy import copy, deepcopy
 from io import open

 from .utils import bfs, eval_escaping
 from .utils import bfs, eval_escaping, Py36, logger, classify_bool
 from .lexer import Token, TerminalDef, PatternStr, PatternRE

 from .parse_tree_builder import ParseTreeBuilder
@@ -85,7 +85,7 @@ TERMINALS = {
    'RULE': '!?[_?]?[a-z][_a-z0-9]*',
    'TERMINAL': '_?[A-Z][_A-Z0-9]*',
    'STRING': r'"(\\"|\\\\|[^"\n])*?"i?',
    'REGEXP': r'/(?!/)(\\/|\\\\|[^/\n])*?/[%s]*' % _RE_FLAGS,
    'REGEXP': r'/(?!/)(\\/|\\\\|[^/])*?/[%s]*' % _RE_FLAGS,
    '_NL': r'(\r?\n)+\s*',
    'WS': r'[ \t]+',
    'COMMENT': r'\s*//[^\n]*',
@@ -307,6 +307,7 @@ class PrepareAnonTerminals(Transformer_InPlace):
        self.term_set = {td.name for td in self.terminals}
        self.term_reverse = {td.pattern: td for td in terminals}
        self.i = 0
        self.rule_options = None


    @inline_args
@@ -335,7 +336,7 @@ class PrepareAnonTerminals(Transformer_InPlace):
                    term_name = None

        elif isinstance(p, PatternRE):
            if p in self.term_reverse: # Kind of a wierd placement.name
            if p in self.term_reverse: # Kind of a weird placement.name
                term_name = self.term_reverse[p].name
        else:
            assert False, p
@@ -351,7 +352,10 @@ class PrepareAnonTerminals(Transformer_InPlace):
            self.term_reverse[p] = termdef
            self.terminals.append(termdef)

        return Terminal(term_name, filter_out=isinstance(p, PatternStr))
        filter_out = False if self.rule_options and self.rule_options.keep_all_tokens else isinstance(p, PatternStr)

        return Terminal(term_name, filter_out=filter_out)


 class _ReplaceSymbols(Transformer_InPlace):
    " Helper for ApplyTemplates "
@@ -405,6 +409,13 @@ def _literal_to_pattern(literal):
    flags = v[flag_start:]
    assert all(f in _RE_FLAGS for f in flags), flags

    if literal.type == 'STRING' and '\n' in v:
        raise GrammarError('You cannot put newlines in string literals')

    if literal.type == 'REGEXP' and '\n' in v and 'x' not in flags:
        raise GrammarError('You can only use newlines in regular expressions '
                           'with the `x` (verbose) flag')

    v = v[:flag_start]
    assert v[0] == v[-1] and v[0] in '"/'
    x = v[1:-1]
@@ -413,9 +424,11 @@ def _literal_to_pattern(literal):

    if literal.type == 'STRING':
        s = s.replace('\\\\', '\\')

    return { 'STRING': PatternStr,
             'REGEXP': PatternRE }[literal.type](s, flags)
        return PatternStr(s, flags)
    elif literal.type == 'REGEXP':
        return PatternRE(s, flags)
    else:
        assert False, 'Invariant failed: literal.type not in ["STRING", "REGEXP"]'


@inline_args
@@ -432,6 +445,20 @@ class PrepareLiterals(Transformer_InPlace):
        return ST('pattern', [PatternRE(regexp)])


 def _make_joined_pattern(regexp, flags_set):
    # In Python 3.6, a new syntax for flags was introduced, that allows us to restrict the scope
    # of flags to a specific regexp group. We are already using it in `lexer.Pattern._get_flags`
    # However, for prior Python versions, we still need to use global flags, so we have to make sure
    # that there are no flag collisions when we merge several terminals.
    flags = ()
    if not Py36:
        if len(flags_set) > 1:
            raise GrammarError("Lark doesn't support joining terminals with conflicting flags in python <3.6!")
        elif len(flags_set) == 1:
            flags ,= flags_set

    return PatternRE(regexp, flags)

 class TerminalTreeToPattern(Transformer):
    def pattern(self, ps):
        p ,= ps
@@ -441,16 +468,16 @@ class TerminalTreeToPattern(Transformer):
        assert items
        if len(items) == 1:
            return items[0]
        if len({i.flags for i in items}) > 1:
            raise GrammarError("Lark doesn't support joining terminals with conflicting flags!")
        return PatternRE(''.join(i.to_regexp() for i in items), items[0].flags if items else ())

        pattern = ''.join(i.to_regexp() for i in items)
        return _make_joined_pattern(pattern, {i.flags for i in items})

    def expansions(self, exps):
        if len(exps) == 1:
            return exps[0]
        if len({i.flags for i in exps}) > 1:
            raise GrammarError("Lark doesn't support joining terminals with conflicting flags!")
        return PatternRE('(?:%s)' % ('|'.join(i.to_regexp() for i in exps)), exps[0].flags)

        pattern = '(?:%s)' % ('|'.join(i.to_regexp() for i in exps))
        return _make_joined_pattern(pattern, {i.flags for i in exps})

    def expr(self, args):
        inner, op = args[:2]
@@ -527,7 +554,8 @@ class Grammar:
        # =================

        # 1. Pre-process terminals
        transformer = PrepareLiterals() * PrepareSymbols() * PrepareAnonTerminals(terminals)  # Adds to terminals
        anon_tokens_transf = PrepareAnonTerminals(terminals)
        transformer = PrepareLiterals() * PrepareSymbols() * anon_tokens_transf  # Adds to terminals

        # 2. Inline Templates

@@ -542,8 +570,10 @@ class Grammar:
            i += 1
            if len(params) != 0: # Dont transform templates
                continue
            ebnf_to_bnf.rule_options = RuleOptions(keep_all_tokens=True) if options.keep_all_tokens else None
            rule_options = RuleOptions(keep_all_tokens=True) if options and options.keep_all_tokens else None
            ebnf_to_bnf.rule_options = rule_options
            ebnf_to_bnf.prefix = name
            anon_tokens_transf.rule_options = rule_options
            tree = transformer.transform(rule_tree)
            res = ebnf_to_bnf.transform(tree)
            rules.append((name, res, options))
@@ -601,7 +631,9 @@ class Grammar:
                                if isinstance(s, NonTerminal)
                                and s != r.origin}
            used_rules |= {NonTerminal(s) for s in start}
            compiled_rules = [r for r in compiled_rules if r.origin in used_rules]
            compiled_rules, unused = classify_bool(compiled_rules, lambda r: r.origin in used_rules)
            for r in unused:
                logger.debug("Unused rule: %s", r)
            if len(compiled_rules) == c:
                break

@@ -609,14 +641,16 @@ class Grammar:
        used_terms = {t.name for r in compiled_rules
                             for t in r.expansion
                             if isinstance(t, Terminal)}
        terminals = [t for t in terminals if t.name in used_terms or t.name in self.ignore]
        terminals, unused = classify_bool(terminals, lambda t: t.name in used_terms or t.name in self.ignore)
        if unused:
            logger.debug("Unused terminals: %s", [t.name for t in unused])

        return terminals, compiled_rules, self.ignore



 _imported_grammars = {}
 def import_grammar(grammar_path, base_paths=[]):
 def import_grammar(grammar_path, re_, base_paths=[]):
    if grammar_path not in _imported_grammars:
        import_paths = base_paths + IMPORT_PATHS
        for import_path in import_paths:
@@ -624,7 +658,7 @@ def import_grammar(grammar_path, base_paths=[]):
                joined_path = os.path.join(import_path, grammar_path)
                with open(joined_path, encoding='utf8') as f:
                    text = f.read()
                grammar = load_grammar(text, joined_path)
                grammar = load_grammar(text, joined_path, re_)
                _imported_grammars[grammar_path] = grammar
                break
        else:
@@ -755,18 +789,33 @@ def _find_used_symbols(tree):
              for t in x.scan_values(lambda t: t.type in ('RULE', 'TERMINAL'))}

 class GrammarLoader:
    def __init__(self):
    ERRORS = [
        ('Unclosed parenthesis', ['a: (\n']),
        ('Umatched closing parenthesis', ['a: )\n', 'a: [)\n', 'a: (]\n']),
        ('Expecting rule or terminal definition (missing colon)', ['a\n', 'A\n', 'a->\n', 'A->\n', 'a A\n']),
        ('Illegal name for rules or terminals', ['Aa:\n']),
        ('Alias expects lowercase name', ['a: -> "a"\n']),
        ('Unexpected colon', ['a::\n', 'a: b:\n', 'a: B:\n', 'a: "a":\n']),
        ('Misplaced operator', ['a: b??', 'a: b(?)', 'a:+\n', 'a:?\n', 'a:*\n', 'a:|*\n']),
        ('Expecting option ("|") or a new rule or terminal definition', ['a:a\n()\n']),
        ('Terminal names cannot contain dots', ['A.B\n']),
        ('%import expects a name', ['%import "a"\n']),
        ('%ignore expects a value', ['%ignore %import\n']),
    ]

    def __init__(self, re_module):
        terminals = [TerminalDef(name, PatternRE(value)) for name, value in TERMINALS.items()]

        rules = [options_from_rule(name, None, x) for name, x in  RULES.items()]
        rules = [Rule(NonTerminal(r), symbols_from_strcase(x.split()), i, None, o) for r, _p, xs, o in rules for i, x in enumerate(xs)]
        callback = ParseTreeBuilder(rules, ST).create_callback()
        lexer_conf = LexerConf(terminals, ['WS', 'COMMENT'])
        lexer_conf = LexerConf(terminals, re_module, ['WS', 'COMMENT'])

        parser_conf = ParserConf(rules, callback, ['start'])
        self.parser = LALR_TraditionalLexer(lexer_conf, parser_conf)

        self.canonize_tree = CanonizeTree()
        self.re_module = re_module

    def load_grammar(self, grammar_text, grammar_name='<?>'):
        "Parse grammar_text, verify, and create Grammar object. Display nice messages on error."
@@ -779,19 +828,9 @@ class GrammarLoader:
                               (e.line, e.column, grammar_name, context))
        except UnexpectedToken as e:
            context = e.get_context(grammar_text)
            error = e.match_examples(self.parser.parse, {
                'Unclosed parenthesis': ['a: (\n'],
                'Umatched closing parenthesis': ['a: )\n', 'a: [)\n', 'a: (]\n'],
                'Expecting rule or terminal definition (missing colon)': ['a\n', 'a->\n', 'A->\n', 'a A\n'],
                'Alias expects lowercase name': ['a: -> "a"\n'],
                'Unexpected colon': ['a::\n', 'a: b:\n', 'a: B:\n', 'a: "a":\n'],
                'Misplaced operator': ['a: b??', 'a: b(?)', 'a:+\n', 'a:?\n', 'a:*\n', 'a:|*\n'],
                'Expecting option ("|") or a new rule or terminal definition': ['a:a\n()\n'],
                '%import expects a name': ['%import "a"\n'],
                '%ignore expects a value': ['%ignore %import\n'],
            })
            error = e.match_examples(self.parser.parse, self.ERRORS, use_accepts=True)
            if error:
                raise GrammarError("%s at line %s column %s\n\n%s" % (error, e.line, e.column, context))
                raise GrammarError("%s, at line %s column %s\n\n%s" % (error, e.line, e.column, context))
            elif 'STRING' in e.expected:
                raise GrammarError("Expecting a value at line %s column %s\n\n%s" % (e.line, e.column, context))
            raise
@@ -819,7 +858,7 @@ class GrammarLoader:
                if len(stmt.children) > 1:
                    path_node, arg1 = stmt.children
                else:
                    path_node, = stmt.children
                    path_node ,= stmt.children
                    arg1 = None

                if isinstance(arg1, Tree):  # Multi import
@@ -862,7 +901,7 @@ class GrammarLoader:
        # import grammars
        for dotted_path, (base_paths, aliases) in imports.items():
            grammar_path = os.path.join(*dotted_path) + EXT
            g = import_grammar(grammar_path, base_paths=base_paths)
            g = import_grammar(grammar_path, self.re_module, base_paths=base_paths)
            new_td, new_rd = import_from_grammar_into_namespace(g, '__'.join(dotted_path), aliases)

            term_defs += new_td
@@ -942,4 +981,5 @@ class GrammarLoader:



 load_grammar = GrammarLoader().load_grammar
 def load_grammar(grammar, source, re_):
    return GrammarLoader(re_).load_grammar(grammar, source)
--- a/lark/parse_tree_builder.py
+++ b/lark/parse_tree_builder.py
@@ -195,6 +195,86 @@ def maybe_create_ambiguous_expander(tree_class, expansion, keep_all_tokens):
    if to_expand:
        return partial(AmbiguousExpander, to_expand, tree_class)

 class AmbiguousIntermediateExpander:
    """
    Propagate ambiguous intermediate nodes and their derivations up to the
    current rule.

    In general, converts

    rule
      _iambig
        _inter
          someChildren1
          ...
        _inter
          someChildren2
          ...
      someChildren3
      ...

    to

    _ambig
      rule
        someChildren1
        ...
        someChildren3
        ...
      rule
        someChildren2
        ...
        someChildren3
        ...
      rule
        childrenFromNestedIambigs
        ...
        someChildren3
        ...
      ...

    propagating up any nested '_iambig' nodes along the way.
    """

    def __init__(self, tree_class, node_builder):
        self.node_builder = node_builder
        self.tree_class = tree_class

    def __call__(self, children):
        def _is_iambig_tree(child):
            return hasattr(child, 'data') and child.data == '_iambig'

        def _collapse_iambig(children):
            """
            Recursively flatten the derivations of the parent of an '_iambig'
            node. Returns a list of '_inter' nodes guaranteed not
            to contain any nested '_iambig' nodes, or None if children does
            not contain an '_iambig' node.
            """

            # Due to the structure of the SPPF,
            # an '_iambig' node can only appear as the first child
            if children and _is_iambig_tree(children[0]):
                iambig_node = children[0]
                result = []
                for grandchild in iambig_node.children:
                    collapsed = _collapse_iambig(grandchild.children)
                    if collapsed:
                        for child in collapsed:
                            child.children += children[1:]
                        result += collapsed
                    else:
                        new_tree = self.tree_class('_inter', grandchild.children + children[1:])
                        result.append(new_tree)
                return result

        collapsed = _collapse_iambig(children)
        if collapsed:
            processed_nodes = [self.node_builder(c.children) for c in collapsed]
            return self.tree_class('_ambig', processed_nodes)

        return self.node_builder(children)

 def ptb_inline_args(func):
    @wraps(func)
    def f(children):
@@ -239,6 +319,7 @@ class ParseTreeBuilder:
                maybe_create_child_filter(rule.expansion, keep_all_tokens, self.ambiguous, options.empty_indices if self.maybe_placeholders else None),
                self.propagate_positions and PropagatePositions,
                self.ambiguous and maybe_create_ambiguous_expander(self.tree_class, rule.expansion, keep_all_tokens),
                self.ambiguous and partial(AmbiguousIntermediateExpander, self.tree_class)
            ]))

            yield rule, wrapper_chain
--- a/lark/parser_frontends.py
+++ b/lark/parser_frontends.py
@@ -1,9 +1,6 @@
 import re
 from functools import partial

 from .utils import get_regexp_width, Serialize
 from .parsers.grammar_analysis import GrammarAnalyzer
 from .lexer import TraditionalLexer, ContextualLexer, Lexer, Token
 from .lexer import TraditionalLexer, ContextualLexer, Lexer, Token, TerminalDef
 from .parsers import earley, xearley, cyk
 from .parsers.lalr_parser import LALR_Parser
 from .grammar import Rule
@@ -21,7 +18,14 @@ def get_frontend(parser, lexer):
        elif lexer == 'contextual':
            return LALR_ContextualLexer
        elif issubclass(lexer, Lexer):
            return partial(LALR_CustomLexer, lexer)
            class LALR_CustomLexerWrapper(LALR_CustomLexer):
                def __init__(self, lexer_conf, parser_conf, options=None):
                    super(LALR_CustomLexerWrapper, self).__init__(
                        lexer, lexer_conf, parser_conf, options=options)
                def init_lexer(self):
                    self.lexer = lexer(self.lexer_conf)

            return LALR_CustomLexerWrapper
        else:
            raise ValueError('Unknown lexer: %s' % lexer)
    elif parser=='earley':
@@ -54,6 +58,15 @@ class _ParserFrontend(Serialize):
        return self.parser.parse(input, start, *args)


 def _get_lexer_callbacks(transformer, terminals):
    result = {}
    for terminal in terminals:
        callback = getattr(transformer, terminal.name, None)
        if callback is not None:
            result[terminal.name] = callback
    return result


 class WithLexer(_ParserFrontend):
    lexer = None
    parser = None
@@ -69,11 +82,18 @@ class WithLexer(_ParserFrontend):
        self.postlex = lexer_conf.postlex

    @classmethod
    def deserialize(cls, data, memo, callbacks, postlex):
    def deserialize(cls, data, memo, callbacks, postlex, transformer, re_module):
        inst = super(WithLexer, cls).deserialize(data, memo)

        inst.postlex = postlex
        inst.parser = LALR_Parser.deserialize(inst.parser, memo, callbacks)

        terminals = [item for item in memo.values() if isinstance(item, TerminalDef)]
        inst.lexer_conf.callbacks = _get_lexer_callbacks(transformer, terminals)
        inst.lexer_conf.re_module = re_module
        inst.lexer_conf.skip_validation=True
        inst.init_lexer()

        return inst

    def _serialize(self, data, memo):
@@ -88,7 +108,7 @@ class WithLexer(_ParserFrontend):
        return self._parse(token_stream, start)

    def init_traditional_lexer(self):
        self.lexer = TraditionalLexer(self.lexer_conf.tokens, ignore=self.lexer_conf.ignore, user_callbacks=self.lexer_conf.callbacks, g_regex_flags=self.lexer_conf.g_regex_flags)
        self.lexer = TraditionalLexer(self.lexer_conf)

 class LALR_WithLexer(WithLexer):
    def __init__(self, lexer_conf, parser_conf, options=None):
@@ -98,7 +118,7 @@ class LALR_WithLexer(WithLexer):

        self.init_lexer()

    def init_lexer(self):
    def init_lexer(self, **kw):
        raise NotImplementedError()

 class LALR_TraditionalLexer(LALR_WithLexer):
@@ -109,11 +129,7 @@ class LALR_ContextualLexer(LALR_WithLexer):
    def init_lexer(self):
        states = {idx:list(t.keys()) for idx, t in self.parser._parse_table.states.items()}
        always_accept = self.postlex.always_accept if self.postlex else ()
        self.lexer = ContextualLexer(self.lexer_conf.tokens, states,
                                     ignore=self.lexer_conf.ignore,
                                     always_accept=always_accept,
                                     user_callbacks=self.lexer_conf.callbacks,
                                     g_regex_flags=self.lexer_conf.g_regex_flags)
        self.lexer = ContextualLexer(self.lexer_conf, states, always_accept=always_accept)


    def parse(self, text, start=None):
@@ -187,8 +203,10 @@ class XEarley(_ParserFrontend):
            else:
                if width == 0:
                    raise ValueError("Dynamic Earley doesn't allow zero-width regexps", t)
            if lexer_conf.use_bytes:
                regexp = regexp.encode('utf-8')

            self.regexps[t.name] = re.compile(regexp, lexer_conf.g_regex_flags)
            self.regexps[t.name] = lexer_conf.re_module.compile(regexp, lexer_conf.g_regex_flags)

    def parse(self, text, start):
        return self._parse(text, start)
@@ -225,4 +243,3 @@ class CYK(WithLexer):

    def _apply_callback(self, tree):
        return self.callbacks[tree.rule](tree.children)

--- a/lark/parsers/earley.py
+++ b/lark/parsers/earley.py
@@ -10,15 +10,15 @@ is better documented here:
    http://www.bramvandersanden.com/post/2014/06/shared-packed-parse-forest/
 """

 import logging
 from collections import deque

 from ..visitors import Transformer_InPlace, v_args
 from ..exceptions import UnexpectedEOF, UnexpectedToken
 from ..utils import logger
 from .grammar_analysis import GrammarAnalyzer
 from ..grammar import NonTerminal
 from .earley_common import Item, TransitiveItem
 from .earley_forest import ForestToTreeVisitor, ForestSumVisitor, SymbolNode, ForestToAmbiguousTreeVisitor
 from .earley_forest import ForestToTreeVisitor, ForestSumVisitor, SymbolNode, CompleteForestToAmbiguousTreeVisitor

 class Parser:
    def __init__(self, parser_conf, term_matcher, resolve_ambiguity=True, debug=False):
@@ -301,7 +301,7 @@ class Parser:
            try:
                debug_walker = ForestToPyDotVisitor()
            except ImportError:
                logging.warning("Cannot find dependency 'pydot', will not generate sppf debug image")
                logger.warning("Cannot find dependency 'pydot', will not generate sppf debug image")
            else:
                debug_walker.visit(solutions[0], "sppf.png")

@@ -313,7 +313,7 @@ class Parser:
            assert False, 'Earley should not generate multiple start symbol items!'

        # Perform our SPPF -> AST conversion using the right ForestVisitor.
        forest_tree_visitor_cls = ForestToTreeVisitor if self.resolve_ambiguity else ForestToAmbiguousTreeVisitor
        forest_tree_visitor_cls = ForestToTreeVisitor if self.resolve_ambiguity else CompleteForestToAmbiguousTreeVisitor
        forest_tree_visitor = forest_tree_visitor_cls(self.callbacks, self.forest_sum_visitor and self.forest_sum_visitor())

        return forest_tree_visitor.visit(solutions[0])
--- a/lark/parsers/earley_forest.py
+++ b/lark/parsers/earley_forest.py
@@ -13,6 +13,7 @@ from collections import deque
 from operator import attrgetter
 from importlib import import_module

 from ..utils import logger
 from ..tree import Tree
 from ..exceptions import ParseError

@@ -328,10 +329,17 @@ class ForestToAmbiguousTreeVisitor(ForestToTreeVisitor):
        self.output_stack[-1].children.append(node)

    def visit_symbol_node_in(self, node):
        if self.forest_sum_visitor and node.is_ambiguous and isinf(node.priority):
            self.forest_sum_visitor.visit(node)
        if not node.is_intermediate and node.is_ambiguous:
            self.output_stack.append(Tree('_ambig', []))
        if node.is_ambiguous:
            if self.forest_sum_visitor and isinf(node.priority):
                self.forest_sum_visitor.visit(node)
            if node.is_intermediate:
                # TODO Support ambiguous intermediate nodes!
                logger.warning("Ambiguous intermediate node in the SPPF: %s. "
                        "Lark does not currently process these ambiguities; resolving with the first derivation.", node)
                return next(iter(node.children))
            else:
                self.output_stack.append(Tree('_ambig', []))

        return iter(node.children)

    def visit_symbol_node_out(self, node):
@@ -355,6 +363,75 @@ class ForestToAmbiguousTreeVisitor(ForestToTreeVisitor):
            else:
                self.result = result

 class CompleteForestToAmbiguousTreeVisitor(ForestToTreeVisitor):
    """
    An augmented version of ForestToAmbiguousTreeVisitor that is designed to
    handle ambiguous intermediate nodes as well as ambiguous symbol nodes.

    On the way down:

    - When an ambiguous intermediate node is encountered, an '_iambig' node
      is inserted into the tree.
    - Each possible derivation of an ambiguous intermediate node is represented
      by an '_inter' node added as a child of the corresponding '_iambig' node.

    On the way up, these nodes are propagated up the tree and collapsed
    into a single '_ambig' node for the nearest symbol node ancestor.
    This is achieved by the AmbiguousIntermediateExpander contained in
    the callbacks.
    """

    def _collapse_ambig(self, children):
        new_children = []
        for child in children:
            if child.data == '_ambig':
                new_children += child.children
            else:
                new_children.append(child)
        return new_children

    def visit_token_node(self, node):
        self.output_stack[-1].children.append(node)

    def visit_symbol_node_in(self, node):
        if node.is_ambiguous:
            if self.forest_sum_visitor and isinf(node.priority):
                self.forest_sum_visitor.visit(node)
            if node.is_intermediate:
                self.output_stack.append(Tree('_iambig', []))
            else:
                self.output_stack.append(Tree('_ambig', []))
        return iter(node.children)

    def visit_symbol_node_out(self, node):
        if node.is_ambiguous:
            result = self.output_stack.pop()
            if not node.is_intermediate:
                result = Tree('_ambig', self._collapse_ambig(result.children))
            if self.output_stack:
                self.output_stack[-1].children.append(result)
            else:
                self.result = result

    def visit_packed_node_in(self, node):
        if not node.parent.is_intermediate:
            self.output_stack.append(Tree('drv', []))
        elif node.parent.is_ambiguous:
            self.output_stack.append(Tree('_inter', []))
        return iter([node.left, node.right])

    def visit_packed_node_out(self, node):
        if not node.parent.is_intermediate:
            result = self.callbacks[node.rule](self.output_stack.pop().children)
        elif node.parent.is_ambiguous:
            result = self.output_stack.pop()
        else:
            return
        if self.output_stack:
            self.output_stack[-1].children.append(result)
        else:
            self.result = result

 class ForestToPyDotVisitor(ForestVisitor):
    """
    A Forest visitor which writes the SPPF to a PNG.
--- a/lark/parsers/grammar_analysis.py
+++ b/lark/parsers/grammar_analysis.py
@@ -138,7 +138,7 @@ class GrammarAnalyzer(object):
        for r in rules:
            for sym in r.expansion:
                if not (sym.is_term or sym in self.rules_by_origin):
                    raise GrammarError("Using an undefined rule: %s" % sym) # TODO test validation
                    raise GrammarError("Using an undefined rule: %s" % sym)

        self.start_states = {start: self.expand_rule(root_rule.origin)
                             for start, root_rule in root_rules.items()}
--- a/lark/parsers/lalr_analysis.py
+++ b/lark/parsers/lalr_analysis.py
@@ -6,10 +6,9 @@ For now, shift/reduce conflicts are automatically resolved as shifts.
 # Author: Erez Shinan (2017)
 # Email : erezshin@gmail.com

 import logging
 from collections import defaultdict, deque
 from collections import defaultdict

 from ..utils import classify, classify_bool, bfs, fzset, Serialize, Enumerator
 from ..utils import classify, classify_bool, bfs, fzset, Enumerator, logger
 from ..exceptions import GrammarError

 from .grammar_analysis import GrammarAnalyzer, Terminal, LR0ItemSet
@@ -256,8 +255,8 @@ class LALR_Analyzer(GrammarAnalyzer):
                    raise GrammarError('Reduce/Reduce collision in %s between the following rules: %s' % (la, ''.join([ '\n\t\t- ' + str(r) for r in rules ])))
                if la in actions:
                    if self.debug:
                        logging.warning('Shift/Reduce conflict for terminal %s: (resolving as shift)', la.name)
                        logging.warning(' * %s', list(rules)[0])
                        logger.warning('Shift/Reduce conflict for terminal %s: (resolving as shift)', la.name)
                        logger.warning(' * %s', list(rules)[0])
                else:
                    actions[la] = (Reduce, list(rules)[0])
            m[state] = { k.name: v for k, v in actions.items() }
--- a/lark/parsers/lalr_parser.py
+++ b/lark/parsers/lalr_parser.py
@@ -7,9 +7,10 @@ from ..lexer import Token
 from ..utils import Enumerator, Serialize

 from .lalr_analysis import LALR_Analyzer, Shift, Reduce, IntParseTable

 from .lalr_puppet import ParserPuppet

 ###{standalone

 class LALR_Parser(object):
    def __init__(self, parser_conf, debug=False):
        assert all(r.options.priority is None for r in parser_conf.rules), "LALR doesn't yet support prioritization"
@@ -37,22 +38,19 @@ class LALR_Parser(object):

 class _Parser:
    def __init__(self, parse_table, callbacks, debug=False):
        self.states = parse_table.states
        self.start_states = parse_table.start_states
        self.end_states = parse_table.end_states
        self.parse_table = parse_table
        self.callbacks = callbacks
        self.debug = debug

    def parse(self, seq, start, set_state=None):
    def parse(self, seq, start, set_state=None, value_stack=None, state_stack=None):
        token = None
        stream = iter(seq)
        states = self.states
        states = self.parse_table.states
        start_state = self.parse_table.start_states[start]
        end_state = self.parse_table.end_states[start]

        start_state = self.start_states[start]
        end_state = self.end_states[start]

        state_stack = [start_state]
        value_stack = []
        state_stack = state_stack or [start_state]
        value_stack = value_stack or []

        if set_state: set_state(start_state)

@@ -61,8 +59,12 @@ class _Parser:
            try:
                return states[state][token.type]
            except KeyError:
                expected = [s for s in states[state].keys() if s.isupper()]
                raise UnexpectedToken(token, expected, state=state)
                expected = {s for s in states[state].keys() if s.isupper()}
                try:
                    puppet = ParserPuppet(self, state_stack, value_stack, start, stream, set_state)
                except NameError:   # For standalone parser
                    puppet = None
                raise UnexpectedToken(token, expected, state=state, puppet=puppet)

        def reduce(rule):
            size = len(rule.expansion)
@@ -114,3 +116,4 @@ class _Parser:
                return value_stack[-1]

 ###}

--- a/lark/parsers/lalr_puppet.py
+++ b/lark/parsers/lalr_puppet.py
@@ -0,0 +1,128 @@
 # This module provide a LALR puppet, which is used to debugging and error handling

 from copy import deepcopy

 from .lalr_analysis import Shift, Reduce
 from .. import Token


 class ParserPuppet(object):
    """ParserPuppet gives you advanced control over error handling when parsing with LALR.

    For a simpler, more streamlined interface, see the ``on_error`` argument to ``Lark.parse()``.
    """
    def __init__(self, parser, state_stack, value_stack, start, stream, set_state):
        self.parser = parser
        self._state_stack = state_stack
        self._value_stack = value_stack
        self._start = start
        self._stream = stream
        self._set_state = set_state

        self.result = None

    def feed_token(self, token):
        """Feed the parser with a token, and advance it to the next state, as if it recieved it from the lexer.

        Note that ``token`` has to be an instance of ``Token``.
        """
        end_state = self.parser.parse_table.end_states[self._start]
        state_stack = self._state_stack
        value_stack = self._value_stack

        state = state_stack[-1]
        action, arg = self.parser.parse_table.states[state][token.type]
        assert arg != end_state

        while action is Reduce:
            rule = arg
            size = len(rule.expansion)
            if size:
                s = value_stack[-size:]
                del state_stack[-size:]
                del value_stack[-size:]
            else:
                s = []

            value = self.parser.callbacks[rule](s)

            _action, new_state = self.parser.parse_table.states[state_stack[-1]][rule.origin.name]
            assert _action is Shift
            state_stack.append(new_state)
            value_stack.append(value)

            if state_stack[-1] == end_state:
                self.result = value_stack[-1]
                return self.result

            state = state_stack[-1]
            action, arg = self.parser.parse_table.states[state][token.type]
            assert arg != end_state

        assert action is Shift
        state_stack.append(arg)
        value_stack.append(token)

    def copy(self):
        """Create a new puppet with a separate state.

        Calls to feed_token() won't affect the old puppet, and vice-versa.
        """
        return type(self)(
            self.parser,
            list(self._state_stack),
            deepcopy(self._value_stack),
            self._start,
            self._stream,
            self._set_state,
        )

    def __eq__(self, other):
        if not isinstance(other, ParserPuppet):
            return False

        return (
            self._state_stack == other._state_stack and
            self._value_stack == other._value_stack and
            self._stream == other._stream and
            self._start == other._start
        )

    def __hash__(self):
        return hash((tuple(self._state_stack), self._start))

    def pretty(self):
        """Print the output of ``choices()`` in a way that's easier to read."""
        out = ["Puppet choices:"]
        for k, v in self.choices().items():
            out.append('\t- %s -> %s' % (k, v))
        out.append('stack size: %s' % len(self._state_stack))
        return '\n'.join(out)

    def choices(self):
        """Returns a dictionary of token types, matched to their action in the parser.

        Only returns token types that are accepted by the current state.

        Updated by ``feed_token()``.
        """
        return self.parser.parse_table.states[self._state_stack[-1]]

    def accepts(self):
        accepts = set()
        for t in self.choices():
            new_puppet = self.copy()
            try:
                new_puppet.feed_token(Token(t, ''))
            except KeyError:
                pass
            else:
                accepts.add(t)
        return accepts

    def resume_parse(self):
        """Resume parsing from the current puppet state."""
        return self.parser.parse(
            self._stream, self._start, self._set_state,
            self._value_stack, self._state_stack
        )
--- a/lark/reconstruct.py
+++ b/lark/reconstruct.py
@@ -1,16 +1,13 @@
 from collections import defaultdict
 """Reconstruct text from a tree, based on Lark grammar"""

 import unicodedata

 from .tree import Tree
 from .visitors import Transformer_InPlace
 from .common import ParserConf
 from .lexer import Token, PatternStr
 from .parsers import earley
 from .grammar import Rule, Terminal, NonTerminal


 from .grammar import Terminal, NonTerminal

 def is_discarded_terminal(t):
    return t.is_term and t.filter_out
 from .tree_matcher import TreeMatcher, is_discarded_terminal

 def is_iter_empty(i):
    try:
@@ -59,105 +56,48 @@ class WriteTokensTransformer(Transformer_InPlace):
        return to_write


 class MatchTree(Tree):
    pass

 class MakeMatchTree:
    def __init__(self, name, expansion):
        self.name = name
        self.expansion = expansion

    def __call__(self, args):
        t = MatchTree(self.name, args)
        t.meta.match_tree = True
        t.meta.orig_expansion = self.expansion
        return t

 def best_from_group(seq, group_key, cmp_key):
    d = {}
    for item in seq:
        key = group_key(item)
        if key in d:
            v1 = cmp_key(item)
            v2 = cmp_key(d[key])
            if v2 > v1:
                d[key] = item
        else:
            d[key] = item
    return list(d.values())

 class Reconstructor:
    def __init__(self, parser, term_subs={}):
        # XXX TODO calling compile twice returns different results!
        assert parser.options.maybe_placeholders == False
        tokens, rules, _grammar_extra = parser.grammar.compile(parser.options.start)

        self.write_tokens = WriteTokensTransformer({t.name:t for t in tokens}, term_subs)
        self.rules = list(self._build_recons_rules(rules))
        self.rules.reverse()

        # Choose the best rule from each group of {rule => [rule.alias]}, since we only really need one derivation.
        self.rules = best_from_group(self.rules, lambda r: r, lambda r: -len(r.expansion))

        self.rules.sort(key=lambda r: len(r.expansion))
        callbacks = {rule: rule.alias for rule in self.rules}   # TODO pass callbacks through dict, instead of alias?
        self.parser = earley.Parser(ParserConf(self.rules, callbacks, parser.options.start),
                                    self._match, resolve_ambiguity=True)

    def _build_recons_rules(self, rules):
        expand1s = {r.origin for r in rules if r.options.expand1}

        aliases = defaultdict(list)
        for r in rules:
            if r.alias:
                aliases[r.origin].append( r.alias )

        rule_names = {r.origin for r in rules}
        nonterminals = {sym for sym in rule_names
                       if sym.name.startswith('_') or sym in expand1s or sym in aliases }

        for r in rules:
            recons_exp = [sym if sym in nonterminals else Terminal(sym.name)
                          for sym in r.expansion if not is_discarded_terminal(sym)]

            # Skip self-recursive constructs
            if recons_exp == [r.origin]:
                continue

            sym = NonTerminal(r.alias) if r.alias else r.origin

            yield Rule(sym, recons_exp, alias=MakeMatchTree(sym.name, r.expansion))

        for origin, rule_aliases in aliases.items():
            for alias in rule_aliases:
                yield Rule(origin, [Terminal(alias)], alias=MakeMatchTree(origin.name, [NonTerminal(alias)]))
            yield Rule(origin, [Terminal(origin.name)], alias=MakeMatchTree(origin.name, [origin]))

    def _match(self, term, token):
        if isinstance(token, Tree):
            return Terminal(token.data) == term
        elif isinstance(token, Token):
            return term == Terminal(token.type)
        assert False
 def _isalnum(x):
    # Categories defined here: https://www.python.org/dev/peps/pep-3131/
    return unicodedata.category(x) in ['Lu', 'Ll', 'Lt', 'Lm', 'Lo', 'Nl', 'Mn', 'Mc', 'Nd', 'Pc']

 class Reconstructor(TreeMatcher):
    """
    A Reconstructor that will, given a full parse Tree, generate source code.

    Note:
        The reconstructor cannot generate values from regexps. If you need to produce discarded
        regexes, such as newlines, use `term_subs` and provide default values for them.

    Paramters:
        parser: a Lark instance
        term_subs: a dictionary of [Terminal name as str] to [output text as str]
    """

    def __init__(self, parser, term_subs=None):
        TreeMatcher.__init__(self, parser)

        self.write_tokens = WriteTokensTransformer({t.name:t for t in self.tokens}, term_subs or {})

    def _reconstruct(self, tree):
        # TODO: ambiguity?
        unreduced_tree = self.parser.parse(tree.children, tree.data)   # find a full derivation
        assert unreduced_tree.data == tree.data
        unreduced_tree = self.match_tree(tree, tree.data)

        res = self.write_tokens.transform(unreduced_tree)
        for item in res:
            if isinstance(item, Tree):
                # TODO use orig_expansion.rulename to support templates
                for x in self._reconstruct(item):
                    yield x
            else:
                yield item

    def reconstruct(self, tree):
    def reconstruct(self, tree, postproc=None):
        x = self._reconstruct(tree)
        if postproc:
            x = postproc(x)
        y = []
        prev_item = ''
        for item in x:
            if prev_item and item and prev_item[-1].isalnum() and item[0].isalnum():
            if prev_item and item and _isalnum(prev_item[-1]) and _isalnum(item[0]):
                y.append(' ')
            y.append(item)
            prev_item = item
--- a/lark/tools/nearley.py
+++ b/lark/tools/nearley.py
@@ -1,8 +1,9 @@
 "Converts between Lark and Nearley grammars. Work in progress!"
 "Converts Nearley grammars to Lark"

 import os.path
 import sys
 import codecs
 import argparse


 from lark import Lark, InlineTransformer
@@ -137,7 +138,7 @@ def _nearley_to_lark(g, builtin_path, n2l, js_code, folder_path, includes):
    return rule_defs


 def create_code_for_nearley_grammar(g, start, builtin_path, folder_path):
 def create_code_for_nearley_grammar(g, start, builtin_path, folder_path, es6=False):
    import js2py

    emit_code = []
@@ -160,7 +161,10 @@ def create_code_for_nearley_grammar(g, start, builtin_path, folder_path):
    for alias, code in n2l.alias_js_code.items():
        js_code.append('%s = (%s);' % (alias, code))

    emit(js2py.translate_js('\n'.join(js_code)))
    if es6:
        emit(js2py.translate_js6('\n'.join(js_code)))
    else:
        emit(js2py.translate_js('\n'.join(js_code)))
    emit('class TransformNearley(Transformer):')
    for alias in n2l.alias_js_code:
        emit("    %s = var.get('%s').to_python()" % (alias, alias))
@@ -173,18 +177,20 @@ def create_code_for_nearley_grammar(g, start, builtin_path, folder_path):

    return ''.join(emit_code)

 def main(fn, start, nearley_lib):
 def main(fn, start, nearley_lib, es6=False):
    with codecs.open(fn, encoding='utf8') as f:
        grammar = f.read()
    return create_code_for_nearley_grammar(grammar, start, os.path.join(nearley_lib, 'builtin'), os.path.abspath(os.path.dirname(fn)))
    return create_code_for_nearley_grammar(grammar, start, os.path.join(nearley_lib, 'builtin'), os.path.abspath(os.path.dirname(fn)), es6=es6)

 def get_arg_parser():
    parser = argparse.ArgumentParser('Reads Nearley grammar (with js functions) outputs an equivalent lark parser.')
    parser.add_argument('nearley_grammar', help='Path to the file containing the nearley grammar')
    parser.add_argument('start_rule', help='Rule within the nearley grammar to make the base rule')
    parser.add_argument('nearley_lib', help='Path to root directory of nearley codebase (used for including builtins)')
    parser.add_argument('--es6', help='Enable experimental ES6 support', action='store_true')
    return parser

 if __name__ == '__main__':
    if len(sys.argv) < 4:
        print("Reads Nearley grammar (with js functions) outputs an equivalent lark parser.")
        print("Usage: %s <nearley_grammar_path> <start_rule> <nearley_lib_path>" % sys.argv[0])
        sys.exit(1)

    fn, start, nearley_lib = sys.argv[1:]

    print(main(fn, start, nearley_lib))
    parser = get_arg_parser()
    args = parser.parse_args()
    print(main(fn=args.nearley_grammar, start=args.start_rule, nearley_lib=args.nearley_lib, es6=args.es6))
--- a/lark/tools/standalone.py
+++ b/lark/tools/standalone.py
@@ -30,10 +30,12 @@ from io import open

 import codecs
 import sys
 import token, tokenize
 import os
 from pprint import pprint
 from os import path
 from collections import defaultdict
 from functools import partial

 import lark
 from lark import Lark
@@ -84,14 +86,50 @@ def extract_sections(lines):
    return {name:''.join(text) for name, text in sections.items()}


 def strip_docstrings(line_gen):
    """ Strip comments and docstrings from a file.
    Based on code from: https://stackoverflow.com/questions/1769332/script-to-remove-python-comments-docstrings
    """
    res = []

    prev_toktype = token.INDENT
    last_lineno = -1
    last_col = 0

    tokgen = tokenize.generate_tokens(line_gen)
    for toktype, ttext, (slineno, scol), (elineno, ecol), ltext in tokgen:
        if slineno > last_lineno:
            last_col = 0
        if scol > last_col:
            res.append(" " * (scol - last_col))
        if toktype == token.STRING and prev_toktype == token.INDENT:
            # Docstring
            res.append("#--")
        elif toktype == tokenize.COMMENT:
            # Comment
            res.append("##\n")
        else:
            res.append(ttext)
        prev_toktype = toktype
        last_col = ecol
        last_lineno = elineno

    return ''.join(res)


 def main(fobj, start):
    lark_inst = Lark(fobj, parser="lalr", lexer="contextual", start=start)

    print('# The file was automatically generated by Lark v%s' % lark.__version__)
    print('__version__ = "%s"' % lark.__version__)
    print()

    for pyfile in EXTRACT_STANDALONE_FILES:
    for i, pyfile in enumerate(EXTRACT_STANDALONE_FILES):
        with open(os.path.join(_larkdir, pyfile)) as f:
            print (extract_sections(f)['standalone'])
            code = extract_sections(f)['standalone']
            if i:   # if not this file
                code = strip_docstrings(partial(next, iter(code.splitlines(True))))
            print(code)

    data, m = lark_inst.memo_serialize([TerminalDef, Rule])
    print( 'DATA = (' )
--- a/lark/tree.py
+++ b/lark/tree.py
@@ -4,15 +4,29 @@ except ImportError:
    pass

 from copy import deepcopy
 from collections import OrderedDict


 ###{standalone
 from collections import OrderedDict


 class Meta:
    def __init__(self):
        self.empty = True


 class Tree(object):
    """The main tree class.

    Creates a new tree, and stores "data" and "children" in attributes of the same name.
    Trees can be hashed and compared.

    Parameters:
        data: The name of the rule or alias
        children: List of matched sub-rules and terminals
        meta: Line & Column numbers (if ``propagate_positions`` is enabled).
            meta attributes: line, column, start_pos, end_line, end_column, end_pos
    """
    def __init__(self, data, children, meta=None):
        self.data = data
        self.children = children
@@ -44,6 +58,10 @@ class Tree(object):
        return l

    def pretty(self, indent_str='  '):
        """Returns an indented string representation of the tree.

        Great for debugging.
        """
        return ''.join(self._pretty(0, indent_str))

    def __eq__(self, other):
@@ -59,6 +77,10 @@ class Tree(object):
        return hash((self.data, tuple(self.children)))

    def iter_subtrees(self):
        """Depth-first iteration.

        Iterates over all the subtrees, never returning to the same node twice (Lark's parse-tree is actually a DAG).
        """
        queue = [self]
        subtrees = OrderedDict()
        for subtree in queue:
@@ -70,11 +92,11 @@ class Tree(object):
        return reversed(list(subtrees.values()))

    def find_pred(self, pred):
        "Find all nodes where pred(tree) == True"
        """Returns all nodes of the tree that evaluate pred(node) as true."""
        return filter(pred, self.iter_subtrees())

    def find_data(self, data):
        "Find all nodes where tree.data == data"
        """Returns all nodes of the tree whose data equals the given data."""
        return self.find_pred(lambda t: t.data == data)

 ###}
@@ -95,6 +117,10 @@ class Tree(object):
                    yield c

    def iter_subtrees_topdown(self):
        """Breadth-first iteration.

        Iterates over all the subtrees, return nodes in order like pretty() does.
        """
        stack = [self]
        while stack:
            node = stack.pop()
@@ -105,7 +131,7 @@ class Tree(object):
                stack.append(n)

    def __deepcopy__(self, memo):
        return type(self)(self.data, deepcopy(self.children, memo))
        return type(self)(self.data, deepcopy(self.children, memo), meta=self._meta)

    def copy(self):
        return type(self)(self.data, self.children)
@@ -134,6 +160,15 @@ class SlottedTree(Tree):


 def pydot__tree_to_png(tree, filename, rankdir="LR", **kwargs):
    graph = pydot__tree_to_graph(tree, rankdir, **kwargs)
    graph.write_png(filename)


 def pydot__tree_to_dot(tree, filename, rankdir="LR", **kwargs):
    graph = pydot__tree_to_graph(tree, rankdir, **kwargs)
    graph.write(filename)

 def pydot__tree_to_graph(tree, rankdir="LR", **kwargs):
    """Creates a colorful image that represents the tree (data+children, without meta)

    Possible values for `rankdir` are "TB", "LR", "BT", "RL", corresponding to
@@ -171,4 +206,4 @@ def pydot__tree_to_png(tree, filename, rankdir="LR", **kwargs):
        return node

    _to_pydot(tree)
    graph.write_png(filename)
    return graph
--- a/lark/tree_matcher.py
+++ b/lark/tree_matcher.py
@@ -0,0 +1,177 @@
 """Tree matcher based on Lark grammar"""

 import re
 from collections import defaultdict

 from . import Tree, Token
 from .common import ParserConf
 from .parsers import earley
 from .grammar import Rule, Terminal, NonTerminal


 def is_discarded_terminal(t):
    return t.is_term and t.filter_out


 class _MakeTreeMatch:
    def __init__(self, name, expansion):
        self.name = name
        self.expansion = expansion

    def __call__(self, args):
        t = Tree(self.name, args)
        t.meta.match_tree = True
        t.meta.orig_expansion = self.expansion
        return t


 def _best_from_group(seq, group_key, cmp_key):
    d = {}
    for item in seq:
        key = group_key(item)
        if key in d:
            v1 = cmp_key(item)
            v2 = cmp_key(d[key])
            if v2 > v1:
                d[key] = item
        else:
            d[key] = item
    return list(d.values())


 def _best_rules_from_group(rules):
    rules = _best_from_group(rules, lambda r: r, lambda r: -len(r.expansion))
    rules.sort(key=lambda r: len(r.expansion))
    return rules


 def _match(term, token):
    if isinstance(token, Tree):
        name, _args = parse_rulename(term.name)
        return token.data == name
    elif isinstance(token, Token):
        return term == Terminal(token.type)
    assert False


 def make_recons_rule(origin, expansion, old_expansion):
    return Rule(origin, expansion, alias=_MakeTreeMatch(origin.name, old_expansion))


 def make_recons_rule_to_term(origin, term):
    return make_recons_rule(origin, [Terminal(term.name)], [term])


 def parse_rulename(s):
    "Parse rule names that may contain a template syntax (like rule{a, b, ...})"
    name, args_str = re.match(r'(\w+)(?:{(.+)})?', s).groups()
    args = args_str and [a.strip() for a in args_str.split(',')]
    return name, args


 class TreeMatcher:
    """Match the elements of a tree node, based on an ontology
    provided by a Lark grammar.

    Supports templates and inlined rules (`rule{a, b,..}` and `_rule`)

    Initiialize with an instance of Lark.
    """

    def __init__(self, parser):
        # XXX TODO calling compile twice returns different results!
        assert parser.options.maybe_placeholders == False
        self.tokens, rules, _extra = parser.grammar.compile(parser.options.start)

        self.rules_for_root = defaultdict(list)

        self.rules = list(self._build_recons_rules(rules))
        self.rules.reverse()

        # Choose the best rule from each group of {rule => [rule.alias]}, since we only really need one derivation.
        self.rules = _best_rules_from_group(self.rules)

        self.parser = parser
        self._parser_cache = {}

    def _build_recons_rules(self, rules):
        "Convert tree-parsing/construction rules to tree-matching rules"
        expand1s = {r.origin for r in rules if r.options.expand1}

        aliases = defaultdict(list)
        for r in rules:
            if r.alias:
                aliases[r.origin].append(r.alias)

        rule_names = {r.origin for r in rules}
        nonterminals = {sym for sym in rule_names
                        if sym.name.startswith('_') or sym in expand1s or sym in aliases}

        seen = set()
        for r in rules:
            recons_exp = [sym if sym in nonterminals else Terminal(sym.name)
                          for sym in r.expansion if not is_discarded_terminal(sym)]

            # Skip self-recursive constructs
            if recons_exp == [r.origin] and r.alias is None:
                continue

            sym = NonTerminal(r.alias) if r.alias else r.origin
            rule = make_recons_rule(sym, recons_exp, r.expansion)

            if sym in expand1s and len(recons_exp) != 1:
                self.rules_for_root[sym.name].append(rule)

                if sym.name not in seen:
                    yield make_recons_rule_to_term(sym, sym)
                    seen.add(sym.name)
            else:
                if sym.name.startswith('_') or sym in expand1s:
                    yield rule
                else:
                    self.rules_for_root[sym.name].append(rule)

        for origin, rule_aliases in aliases.items():
            for alias in rule_aliases:
                yield make_recons_rule_to_term(origin, NonTerminal(alias))
            yield make_recons_rule_to_term(origin, origin)

    def match_tree(self, tree, rulename):
        """Match the elements of `tree` to the symbols of rule `rulename`.

        Parameters:
            tree (Tree): the tree node to match
            rulename (str): The expected full rule name (including template args)

        Returns:
            Tree: an unreduced tree that matches `rulename`

        Raises:
            UnexpectedToken: If no match was found.

        Note:
            It's the callers' responsibility match the tree recursively.
        """
        if rulename:
            # validate
            name, _args = parse_rulename(rulename)
            assert tree.data == name
        else:
            rulename = tree.data

        # TODO: ambiguity?
        try:
            parser = self._parser_cache[rulename]
        except KeyError:
            rules = self.rules + _best_rules_from_group(self.rules_for_root[rulename])

            # TODO pass callbacks through dict, instead of alias?
            callbacks = {rule: rule.alias for rule in rules}
            conf = ParserConf(rules, callbacks, [rulename])
            parser = earley.Parser(conf, _match, resolve_ambiguity=True)
            self._parser_cache[rulename] = parser

        # find a full derivation
        unreduced_tree = parser.parse(tree.children, rulename)
        assert unreduced_tree.data == rulename
        return unreduced_tree
--- a/lark/utils.py
+++ b/lark/utils.py
@@ -4,51 +4,15 @@ from functools import reduce
 from ast import literal_eval
 from collections import deque

 class fzset(frozenset):
    def __repr__(self):
        return '{%s}' % ', '.join(map(repr, self))


 def classify_bool(seq, pred):
    true_elems = []
    false_elems = []

    for elem in seq:
        if pred(elem):
            true_elems.append(elem)
        else:
            false_elems.append(elem)

    return true_elems, false_elems



 def bfs(initial, expand):
    open_q = deque(list(initial))
    visited = set(open_q)
    while open_q:
        node = open_q.popleft()
        yield node
        for next_node in expand(node):
            if next_node not in visited:
                visited.add(next_node)
                open_q.append(next_node)


 ###{standalone
 import logging
 logger = logging.getLogger("lark")
 logger.addHandler(logging.StreamHandler())
 # Set to highest level, since we have some warnings amongst the code
 # By default, we should not output any log messages
 logger.setLevel(logging.CRITICAL)


 def _serialize(value, memo):
    if isinstance(value, Serialize):
        return value.serialize(memo)
    elif isinstance(value, list):
        return [_serialize(elem, memo) for elem in value]
    elif isinstance(value, frozenset):
        return list(value)  # TODO reversible?
    elif isinstance(value, dict):
        return {key:_serialize(elem, memo) for key, elem in value.items()}
    return value

 ###{standalone
 def classify(seq, key=None, value=None):
    d = {}
    for item in seq:
@@ -165,16 +129,31 @@ def smart_decorator(f, create_decorator):
    else:
        return create_decorator(f.__func__.__call__, True)

 try:
    import regex
 except ImportError:
    regex = None

 import sys, re
 Py36 = (sys.version_info[:2] >= (3, 6))

 import sre_parse
 import sre_constants
 def get_regexp_width(regexp):
 categ_pattern = re.compile(r'\\p{[A-Za-z_]+}')
 def get_regexp_width(expr):
    if regex:
        # Since `sre_parse` cannot deal with Unicode categories of the form `\p{Mn}`, we replace these with
        # a simple letter, which makes no difference as we are only trying to get the possible lengths of the regex
        # match here below.
        regexp_final = re.sub(categ_pattern, 'A', expr)
    else:
        if re.search(categ_pattern, expr):
            raise ImportError('`regex` module must be installed in order to use Unicode categories.', expr)
        regexp_final = expr
    try:
        return [int(x) for x in sre_parse.parse(regexp).getwidth()]
        return [int(x) for x in sre_parse.parse(regexp_final).getwidth()]
    except sre_constants.error:
        raise ValueError(regexp)
        raise ValueError(expr)

 ###}

@@ -182,7 +161,7 @@ def get_regexp_width(regexp):
 def dedup_list(l):
    """Given a list (l) will removing duplicates from the list,
       preserving the original order of the list. Assumes that
       the list entrie are hashable."""
       the list entries are hashable."""
    dedup = set()
    return [ x for x in l if not (x in dedup or dedup.add(x))]

@@ -287,7 +266,60 @@ def combine_alternatives(lists):
    return reduce(lambda a,b: [i+[j] for i in a for j in b], lists[1:], init)



 class FS:
    open = open
    exists = os.path.exists
    exists = os.path.exists


 def isascii(s):
    """ str.isascii only exists in python3.7+ """
    try:
        return s.isascii()
    except AttributeError:
        try:
            s.encode('ascii')
            return True
        except (UnicodeDecodeError, UnicodeEncodeError):
            return False


 class fzset(frozenset):
    def __repr__(self):
        return '{%s}' % ', '.join(map(repr, self))


 def classify_bool(seq, pred):
    true_elems = []
    false_elems = []

    for elem in seq:
        if pred(elem):
            true_elems.append(elem)
        else:
            false_elems.append(elem)

    return true_elems, false_elems


 def bfs(initial, expand):
    open_q = deque(list(initial))
    visited = set(open_q)
    while open_q:
        node = open_q.popleft()
        yield node
        for next_node in expand(node):
            if next_node not in visited:
                visited.add(next_node)
                open_q.append(next_node)


 def _serialize(value, memo):
    if isinstance(value, Serialize):
        return value.serialize(memo)
    elif isinstance(value, list):
        return [_serialize(elem, memo) for elem in value]
    elif isinstance(value, frozenset):
        return list(value)  # TODO reversible?
    elif isinstance(value, dict):
        return {key:_serialize(elem, memo) for key, elem in value.items()}
    return value
--- a/lark/visitors.py
+++ b/lark/visitors.py
@@ -9,11 +9,16 @@ from .lexer import Token
 from inspect import getmembers, getmro

 class Discard(Exception):
    """When raising the Discard exception in a transformer callback,
    that node is discarded and won't appear in the parent.
    """
    pass

 # Transformers

 class _Decoratable:
    "Provides support for decorating methods with @v_args"

    @classmethod
    def _apply_decorator(cls, decorator, **kwargs):
        mro = getmro(cls)
@@ -40,12 +45,31 @@ class _Decoratable:


 class Transformer(_Decoratable):
    """Visits the tree recursively, starting with the leaves and finally the root (bottom-up)
    """Transformers visit each node of the tree, and run the appropriate method on it according to the node's data.

    Calls its methods (provided by user via inheritance) according to tree.data
    Calls its methods (provided by user via inheritance) according to ``tree.data``.
    The returned value replaces the old one in the structure.

    Can be used to implement map or reduce.
    They work bottom-up (or depth-first), starting with the leaves and ending at the root of the tree.
    Transformers can be used to implement map & reduce patterns. Because nodes are reduced from leaf to root,
    at any point the callbacks may assume the children have already been transformed (if applicable).

    ``Transformer`` can do anything ``Visitor`` can do, but because it reconstructs the tree,
    it is slightly less efficient. It can be used to implement map or reduce patterns.

    All these classes implement the transformer interface:

    - ``Transformer`` - Recursively transforms the tree. This is the one you probably want.
    - ``Transformer_InPlace`` - Non-recursive. Changes the tree in-place instead of returning new instances
    - ``Transformer_InPlaceRecursive`` - Recursive. Changes the tree in-place instead of returning new instances

    Parameters:
        visit_tokens: By default, transformers only visit rules.
            visit_tokens=True will tell ``Transformer`` to visit tokens
            as well. This is a slightly slower alternative to lexer_callbacks
            but it's easier to maintain and works for all algorithms
            (even when there isn't a lexer).

    """
    __visit_tokens__ = True   # For backwards compatibility

@@ -108,11 +132,19 @@ class Transformer(_Decoratable):
        return TransformerChain(self, other)

    def __default__(self, data, children, meta):
        "Default operation on tree (for override)"
        """Default operation on tree (for override)

        Function that is called on if a function with a corresponding name has not been found.
        Defaults to reconstruct the Tree.
        """
        return Tree(data, children, meta)

    def __default_token__(self, token):
        "Default operation on token (for override)"
        """Default operation on token (for override)

        Function that is called on if a function with a corresponding name has not been found.
        Defaults to just return the argument.
        """
        return token


@@ -209,10 +241,10 @@ class VisitorBase:


 class Visitor(VisitorBase):
    """Bottom-up visitor, non-recursive
    """Bottom-up visitor, non-recursive.

    Visits the tree, starting with the leaves and finally the root (bottom-up)
    Calls its methods (provided by user via inheritance) according to tree.data
    Calls its methods (provided by user via inheritance) according to ``tree.data``
    """

    def visit(self, tree):
@@ -225,11 +257,12 @@ class Visitor(VisitorBase):
            self._call_userfunc(subtree)
        return tree


 class Visitor_Recursive(VisitorBase):
    """Bottom-up visitor, recursive
    """Bottom-up visitor, recursive.

    Visits the tree, starting with the leaves and finally the root (bottom-up)
    Calls its methods (provided by user via inheritance) according to tree.data
    Calls its methods (provided by user via inheritance) according to ``tree.data``
    """

    def visit(self, tree):
@@ -261,13 +294,15 @@ def visit_children_decor(func):


 class Interpreter(_Decoratable):
    """Top-down visitor, recursive
    """Interpreter walks the tree starting at the root.

    Visits the tree, starting with the root and finally the leaves (top-down)
    Calls its methods (provided by user via inheritance) according to tree.data

    Unlike Transformer and Visitor, the Interpreter doesn't automatically visit its sub-branches.
    The user has to explicitly call visit_children, or use the @visit_children_decor
    For each tree node, it calls its methods (provided by user via inheritance) according to ``tree.data``.

    Unlike ``Transformer`` and ``Visitor``, the Interpreter doesn't automatically visit its sub-branches.
    The user has to explicitly call ``visit``, ``visit_children``, or use the ``@visit_children_decor``.
    This allows the user to implement branching and loops.
    """

    def visit(self, tree):
@@ -350,8 +385,34 @@ def _vargs_meta(f, data, children, meta):
 def _vargs_tree(f, data, children, meta):
    return f(Tree(data, children, meta))


 def v_args(inline=False, meta=False, tree=False, wrapper=None):
    "A convenience decorator factory, for modifying the behavior of user-supplied visitor methods"
    """A convenience decorator factory for modifying the behavior of user-supplied visitor methods.

    By default, callback methods of transformers/visitors accept one argument - a list of the node's children.

    ``v_args`` can modify this behavior. When used on a transformer/visitor class definition,
    it applies to all the callback methods inside it.

    Parameters:
        inline: Children are provided as ``*args`` instead of a list argument (not recommended for very long lists).
        meta: Provides two arguments: ``children`` and ``meta`` (instead of just the first)
        tree: Provides the entire tree as the argument, instead of the children.

    Example:
        ::

            @v_args(inline=True)
            class SolveArith(Transformer):
                def add(self, left, right):
                    return left + right


            class ReverseNotation(Transformer_InPlace):
                @v_args(tree=True)
                def tree_node(self, tree):
                    tree.children = tree.children[::-1]
    """
    if tree and (meta or inline):
        raise ValueError("Visitor functions cannot combine 'tree' with 'meta' or 'inline'.")

--- a/mkdocs.yml
+++ b/mkdocs.yml
@@ -1,14 +0,0 @@
 site_name: Lark
 theme: readthedocs
 pages:
    - Main Page: index.md
    - Philosophy: philosophy.md
    - Features: features.md
    - Parsers: parsers.md
    - How To Use (Guide): how_to_use.md
    - How To Develop (Guide): how_to_develop.md
    - Grammar Reference: grammar.md
    - Tree Construction Reference: tree_construction.md
    - Visitors and Transformers: visitors.md
    - Classes Reference: classes.md
    - Recipes: recipes.md
--- a/readthedocs.yml
+++ b/readthedocs.yml
@@ -1,10 +1,12 @@
 version: 2

 mkdocs:
  configuration: mkdocs.yml
  fail_on_warning: false

 formats: all

 python:
  version: 3.5
   version: 3.7
   install:
      - requirements: docs/requirements.txt

 # Build documentation in the docs/ directory with Sphinx
 sphinx:
  configuration: docs/conf.py
--- a/setup.py
+++ b/setup.py
@@ -1,4 +1,7 @@
 import re
 try:
    import regex as re
 except ImportError:
    import re
 from setuptools import find_packages, setup

 __version__ ,= re.findall('__version__ = "(.*)"', open('lark/__init__.py').read())
@@ -6,11 +9,16 @@ __version__ ,= re.findall('__version__ = "(.*)"', open('lark/__init__.py').read(
 setup(
    name = "lark-parser",
    version = __version__,
    packages = ['lark', 'lark.parsers', 'lark.tools', 'lark.grammars', 'lark-stubs'],
    packages = ['lark', 'lark.parsers', 'lark.tools', 'lark.grammars', 'lark.__pyinstaller', 'lark-stubs'],

    requires = [],
    install_requires = [],

    extras_require = {
        "regex": ["regex"],
        "nearley": ["js2py"]
    },

    package_data = {'': ['*.md', '*.lark'], 'lark-stubs': ['*.pyi']},

    test_suite = 'tests.__main__',
--- a/nearley-requirements.txt
+++ b/nearley-requirements.txt
@@ -1 +1,2 @@
 Js2Py==0.68
 regex
--- a/tests/main.py
+++ b/tests/main.py
@@ -2,20 +2,24 @@ from __future__ import absolute_import, print_function

 import unittest
 import logging
 from lark import logger

 from .test_trees import TestTrees
 from .test_tools import TestStandalone
 from .test_cache import TestCache
 from .test_grammar import TestGrammar
 from .test_reconstructor import TestReconstructor

 try:
    from .test_nearley.test_nearley import TestNearley
 except ImportError:
    logging.warning("Warning: Skipping tests for Nearley grammar imports (js2py required)")
    logger.warning("Warning: Skipping tests for Nearley grammar imports (js2py required)")

 # from .test_selectors import TestSelectors
 # from .test_grammars import TestPythonG, TestConfigG

 from .test_logger import Testlogger

 from .test_parser import (
        TestLalrStandard,
        TestEarleyStandard,
@@ -31,7 +35,7 @@ from .test_parser import (
        TestParsers,
        )

 logging.basicConfig(level=logging.INFO)
 logger.setLevel(logging.INFO)

 if __name__ == '__main__':
    unittest.main()
--- a/tests/test_cache.py
+++ b/tests/test_cache.py
@@ -4,6 +4,7 @@ import sys
 from unittest import TestCase, main

 from lark import Lark, Tree
 from lark.lexer import Lexer, Token
 import lark.lark as lark_module

 try:
@@ -38,6 +39,15 @@ class MockFS:
        return name in self.files


 class CustomLexer(Lexer):
    def __init__(self, lexer_conf):
        pass

    def lex(self, data):
        for obj in data:
            yield Token('A', obj)


 class TestCache(TestCase):
    def setUp(self):
        pass
@@ -70,6 +80,12 @@ class TestCache(TestCase):
            parser = Lark(g, parser='lalr', cache=True)
            assert parser.parse('a') == Tree('start', [])

            # Test with custom lexer
            mock_fs.files = {}
            parser = Lark(g, parser='lalr', lexer=CustomLexer, cache=True)
            parser = Lark(g, parser='lalr', lexer=CustomLexer, cache=True)
            assert len(mock_fs.files) == 1
            assert parser.parse('a') == Tree('start', [])
        finally:
            lark_module.FS = fs

--- a/tests/test_grammar.py
+++ b/tests/test_grammar.py
@@ -0,0 +1,31 @@
 from __future__ import absolute_import

 import sys
 from unittest import TestCase, main

 from lark import Lark
 from lark.load_grammar import GrammarLoader, GrammarError


 class TestGrammar(TestCase):
    def setUp(self):
        pass

    def test_errors(self):
        for msg, examples in GrammarLoader.ERRORS:
            for example in examples:
                try:
                    p = Lark(example)
                except GrammarError as e:
                    assert msg in str(e)
                else:
                    assert False, "example did not raise an error"




 if __name__ == '__main__':
    main()



--- a/tests/test_logger.py
+++ b/tests/test_logger.py
@@ -0,0 +1,65 @@
 import logging
 from contextlib import contextmanager
 from lark import Lark, logger
 from unittest import TestCase, main

 try:
    from StringIO import StringIO
 except ImportError:
    from io import StringIO

@contextmanager
 def capture_log():
    stream = StringIO()
    orig_handler = logger.handlers[0]
    del logger.handlers[:]
    logger.addHandler(logging.StreamHandler(stream))
    yield stream
    del logger.handlers[:]
    logger.addHandler(orig_handler)

 class Testlogger(TestCase):

    def test_debug(self):
        logger.setLevel(logging.DEBUG)
        collision_grammar = '''
        start: as as
        as: a*
        a: "a"
        '''
        with capture_log() as log:
            Lark(collision_grammar, parser='lalr', debug=True)

        log = log.getvalue()
        # since there are conflicts about A
        # symbol A should appear in the log message for hint
        self.assertIn("A", log)

    def test_non_debug(self):
        logger.setLevel(logging.DEBUG)
        collision_grammar = '''
        start: as as
        as: a*
        a: "a"
        '''
        with capture_log() as log:
            Lark(collision_grammar, parser='lalr', debug=False)
        log = log.getvalue()
        # no log messge
        self.assertEqual(len(log), 0)

    def test_loglevel_higher(self):
        logger.setLevel(logging.ERROR)
        collision_grammar = '''
        start: as as
        as: a*
        a: "a"
        '''
        with capture_log() as log:
            Lark(collision_grammar, parser='lalr', debug=True)
        log = log.getvalue()
        # no log messge
        self.assertEqual(len(log), 0)

 if __name__ == '__main__':
    main()
--- a/tests/test_nearley/test_nearley.py
+++ b/tests/test_nearley/test_nearley.py
@@ -6,16 +6,17 @@ import logging
 import os
 import codecs

 logging.basicConfig(level=logging.INFO)

 from lark import logger
 from lark.tools.nearley import create_code_for_nearley_grammar, main as nearley_tool_main

 logger.setLevel(logging.INFO)

 TEST_PATH    = os.path.abspath(os.path.dirname(__file__))
 NEARLEY_PATH = os.path.join(TEST_PATH, 'nearley')
 BUILTIN_PATH = os.path.join(NEARLEY_PATH, 'builtin')

 if not os.path.exists(NEARLEY_PATH):
    logging.warn("Nearley not installed. Skipping Nearley tests!")
    logger.warn("Nearley not installed. Skipping Nearley tests!")
    raise ImportError("Skipping Nearley tests!")

 import js2py    # Ensures that js2py exists, to avoid failing tests
--- a/tests/test_parser.py
+++ b/tests/test_parser.py
@@ -6,7 +6,12 @@ import unittest
 import logging
 import os
 import sys
 from copy import deepcopy
 from copy import copy, deepcopy

 from lark.utils import Py36, isascii

 from lark import Token

 try:
    from cStringIO import StringIO as cStringIO
 except ImportError:
@@ -18,8 +23,13 @@ from io import (
        open,
    )

 logging.basicConfig(level=logging.INFO)

 try:
    import regex
 except ImportError:
    regex = None

 from lark import logger
 from lark.lark import Lark
 from lark.exceptions import GrammarError, ParseError, UnexpectedToken, UnexpectedInput, UnexpectedCharacters
 from lark.tree import Tree
@@ -27,6 +37,7 @@ from lark.visitors import Transformer, Transformer_InPlace, v_args
 from lark.grammar import Rule
 from lark.lexer import TerminalDef, Lexer, TraditionalLexer

 logger.setLevel(logging.INFO)


 __path__ = os.path.dirname(__file__)
@@ -449,6 +460,221 @@ def _make_full_earley_test(LEXER):
                ])
            self.assertEqual(res, expected)

        def test_ambiguous_intermediate_node(self):
            grammar = """
            start: ab bc d?
            !ab: "A" "B"?
            !bc: "B"? "C"
            !d: "D"
            """

            l = Lark(grammar, parser='earley', ambiguity='explicit', lexer=LEXER)
            ambig_tree = l.parse("ABCD")
            expected = {
                Tree('start', [Tree('ab', ['A']), Tree('bc', ['B', 'C']), Tree('d', ['D'])]),
                Tree('start', [Tree('ab', ['A', 'B']), Tree('bc', ['C']), Tree('d', ['D'])])
            }
            self.assertEqual(ambig_tree.data, '_ambig')
            self.assertEqual(set(ambig_tree.children), expected)

        def test_ambiguous_symbol_and_intermediate_nodes(self):
            grammar = """
            start: ab bc cd
            !ab: "A" "B"?
            !bc: "B"? "C"?
            !cd: "C"? "D"
            """

            l = Lark(grammar, parser='earley', ambiguity='explicit', lexer=LEXER)
            ambig_tree = l.parse("ABCD")
            expected = {
                Tree('start', [
                    Tree('ab', ['A', 'B']),
                    Tree('bc', ['C']),
                    Tree('cd', ['D'])
                ]),
                Tree('start', [
                    Tree('ab', ['A', 'B']),
                    Tree('bc', []),
                    Tree('cd', ['C', 'D'])
                ]),
                Tree('start', [
                    Tree('ab', ['A']),
                    Tree('bc', ['B', 'C']),
                    Tree('cd', ['D'])
                ]),
                Tree('start', [
                    Tree('ab', ['A']),
                    Tree('bc', ['B']),
                    Tree('cd', ['C', 'D'])
                ]),
            }
            self.assertEqual(ambig_tree.data, '_ambig')
            self.assertEqual(set(ambig_tree.children), expected)

        def test_nested_ambiguous_intermediate_nodes(self):
            grammar = """
            start: ab bc cd e?
            !ab: "A" "B"?
            !bc: "B"? "C"?
            !cd: "C"? "D"
            !e: "E"
            """

            l = Lark(grammar, parser='earley', ambiguity='explicit', lexer=LEXER)
            ambig_tree = l.parse("ABCDE")
            expected = {
                Tree('start', [
                    Tree('ab', ['A', 'B']),
                    Tree('bc', ['C']),
                    Tree('cd', ['D']),
                    Tree('e', ['E'])
                ]),
                Tree('start', [
                    Tree('ab', ['A']),
                    Tree('bc', ['B', 'C']),
                    Tree('cd', ['D']),
                    Tree('e', ['E'])
                ]),
                Tree('start', [
                    Tree('ab', ['A']),
                    Tree('bc', ['B']),
                    Tree('cd', ['C', 'D']),
                    Tree('e', ['E'])
                ]),
                Tree('start', [
                    Tree('ab', ['A', 'B']),
                    Tree('bc', []),
                    Tree('cd', ['C', 'D']),
                    Tree('e', ['E'])
                ]),
            }
            self.assertEqual(ambig_tree.data, '_ambig')
            self.assertEqual(set(ambig_tree.children), expected)

        def test_nested_ambiguous_intermediate_nodes2(self):
            grammar = """
            start: ab bc cd de f
            !ab: "A" "B"?
            !bc: "B"? "C"?
            !cd: "C"? "D"?
            !de: "D"? "E"
            !f: "F"
            """

            l = Lark(grammar, parser='earley', ambiguity='explicit', lexer=LEXER)
            ambig_tree = l.parse("ABCDEF")
            expected = {
                Tree('start', [
                    Tree('ab', ['A', 'B']),
                    Tree('bc', ['C']),
                    Tree('cd', ['D']),
                    Tree('de', ['E']),
                    Tree('f', ['F']),
                ]),
                Tree('start', [
                    Tree('ab', ['A']),
                    Tree('bc', ['B', 'C']),
                    Tree('cd', ['D']),
                    Tree('de', ['E']),
                    Tree('f', ['F']),
                ]),
                Tree('start', [
                    Tree('ab', ['A']),
                    Tree('bc', ['B']),
                    Tree('cd', ['C', 'D']),
                    Tree('de', ['E']),
                    Tree('f', ['F']),
                ]),
                Tree('start', [
                    Tree('ab', ['A']),
                    Tree('bc', ['B']),
                    Tree('cd', ['C']),
                    Tree('de', ['D', 'E']),
                    Tree('f', ['F']),
                ]),
                Tree('start', [
                    Tree('ab', ['A', "B"]),
                    Tree('bc', []),
                    Tree('cd', ['C']),
                    Tree('de', ['D', 'E']),
                    Tree('f', ['F']),
                ]),
                Tree('start', [
                    Tree('ab', ['A']),
                    Tree('bc', ['B', 'C']),
                    Tree('cd', []),
                    Tree('de', ['D', 'E']),
                    Tree('f', ['F']),
                ]),
                Tree('start', [
                    Tree('ab', ['A', 'B']),
                    Tree('bc', []),
                    Tree('cd', ['C', 'D']),
                    Tree('de', ['E']),
                    Tree('f', ['F']),
                ]),
                Tree('start', [
                    Tree('ab', ['A', 'B']),
                    Tree('bc', ['C']),
                    Tree('cd', []),
                    Tree('de', ['D', 'E']),
                    Tree('f', ['F']),
                ]),
            }
            self.assertEqual(ambig_tree.data, '_ambig')
            self.assertEqual(set(ambig_tree.children), expected)

        def test_ambiguous_intermediate_node_unnamed_token(self):
            grammar = """
            start: ab bc "D"
            !ab: "A" "B"?
            !bc: "B"? "C"
            """

            l = Lark(grammar, parser='earley', ambiguity='explicit', lexer=LEXER)
            ambig_tree = l.parse("ABCD")
            expected = {
                Tree('start', [Tree('ab', ['A']), Tree('bc', ['B', 'C'])]),
                Tree('start', [Tree('ab', ['A', 'B']), Tree('bc', ['C'])])
            }
            self.assertEqual(ambig_tree.data, '_ambig')
            self.assertEqual(set(ambig_tree.children), expected)

        def test_ambiguous_intermediate_node_inlined_rule(self):
            grammar = """
            start: ab _bc d?
            !ab: "A" "B"?
            _bc: "B"? "C"
            !d: "D"
            """

            l = Lark(grammar, parser='earley', ambiguity='explicit', lexer=LEXER)
            ambig_tree = l.parse("ABCD")
            expected = {
                Tree('start', [Tree('ab', ['A']), Tree('d', ['D'])]),
                Tree('start', [Tree('ab', ['A', 'B']), Tree('d', ['D'])])
            }
            self.assertEqual(ambig_tree.data, '_ambig')
            self.assertEqual(set(ambig_tree.children), expected)

        def test_ambiguous_intermediate_node_conditionally_inlined_rule(self):
            grammar = """
            start: ab bc d?
            !ab: "A" "B"?
            !?bc: "B"? "C"
            !d: "D"
            """

            l = Lark(grammar, parser='earley', ambiguity='explicit', lexer=LEXER)
            ambig_tree = l.parse("ABCD")
            expected = {
                Tree('start', [Tree('ab', ['A']), Tree('bc', ['B', 'C']), Tree('d', ['D'])]),
                Tree('start', [Tree('ab', ['A', 'B']), 'C', Tree('d', ['D'])])
            }
            self.assertEqual(ambig_tree.data, '_ambig')
            self.assertEqual(set(ambig_tree.children), expected)

        def test_fruitflies_ambig(self):
            grammar = """
                start: noun verb noun        -> simple
@@ -549,16 +775,88 @@ class CustomLexer(Lexer):
    so it uses the traditionalparser as implementation without custom lexing behaviour.
    """
    def __init__(self, lexer_conf):
        self.lexer = TraditionalLexer(lexer_conf.tokens, ignore=lexer_conf.ignore, user_callbacks=lexer_conf.callbacks, g_regex_flags=lexer_conf.g_regex_flags)
        self.lexer = TraditionalLexer(copy(lexer_conf))
    def lex(self, *args, **kwargs):
        return self.lexer.lex(*args, **kwargs)

 def _tree_structure_check(a, b):
    """
    Checks that both Tree objects have the same structure, without checking their values.
    """
    assert a.data == b.data and len(a.children) == len(b.children)
    for ca,cb in zip(a.children, b.children):
        assert type(ca) == type(cb)
        if isinstance(ca, Tree):
            _tree_structure_check(ca, cb)
        elif isinstance(ca, Token):
            assert ca.type == cb.type
        else:
            assert ca == cb

 class DualBytesLark:
    """
    A helper class that wraps both a normal parser, and a parser for bytes.
    It automatically transforms `.parse` calls for both lexer, returning the value from the text lexer
    It always checks that both produce the same output/error

    NOTE: Not currently used, but left here for future debugging.
    """

    def __init__(self, g, *args, **kwargs):
        self.text_lexer = Lark(g, *args, use_bytes=False, **kwargs)
        g = self.text_lexer.grammar_source.lower()
        if '\\u' in g or not isascii(g):
            # Bytes re can't deal with uniode escapes
            self.bytes_lark = None
        else:
            # Everything here should work, so use `use_bytes='force'`
            self.bytes_lark = Lark(self.text_lexer.grammar_source, *args, use_bytes='force', **kwargs)

    def parse(self, text, start=None):
        # TODO: Easy workaround, more complex checks would be beneficial
        if not isascii(text) or self.bytes_lark is None:
            return self.text_lexer.parse(text, start)
        try:
            rv = self.text_lexer.parse(text, start)
        except Exception as e:
            try:
                self.bytes_lark.parse(text.encode(), start)
            except Exception as be:
                assert type(e) == type(be), "Parser with and without `use_bytes` raise different exceptions"
                raise e
            assert False, "Parser without `use_bytes` raises exception, with doesn't"
        try:
            bv = self.bytes_lark.parse(text.encode(), start)
        except Exception as be:
            assert False, "Parser without `use_bytes` doesn't raise an exception, with does"
        _tree_structure_check(rv, bv)
        return rv

    @classmethod
    def open(cls, grammar_filename, rel_to=None, **options):
        if rel_to:
            basepath = os.path.dirname(rel_to)
            grammar_filename = os.path.join(basepath, grammar_filename)
        with open(grammar_filename, encoding='utf8') as f:
            return cls(f, **options)

    def save(self,f):
        self.text_lexer.save(f)
        if self.bytes_lark is not None:
            self.bytes_lark.save(f)

    def load(self,f):
        self.text_lexer = self.text_lexer.load(f)
        if self.bytes_lark is not None:
            self.bytes_lark.load(f)

 def _make_parser_test(LEXER, PARSER):
    lexer_class_or_name = CustomLexer if LEXER == 'custom' else LEXER
    def _Lark(grammar, **kwargs):
        return Lark(grammar, lexer=lexer_class_or_name, parser=PARSER, propagate_positions=True, **kwargs)
    def _Lark_open(gfilename, **kwargs):
        return Lark.open(gfilename, lexer=lexer_class_or_name, parser=PARSER, propagate_positions=True, **kwargs)

    class _TestParser(unittest.TestCase):
        def test_basic1(self):
            g = _Lark("""start: a+ b a* "b" a*
@@ -639,6 +937,29 @@ def _make_parser_test(LEXER, PARSER):
                          """)
            g.parse('\x01\x02\x03')

        @unittest.skipIf(sys.version_info[0]==2 or sys.version_info[:2]==(3, 4),
                         "bytes parser isn't perfect in Python2, exceptions don't work correctly")
        def test_bytes_utf8(self):
            g = r"""
            start: BOM? char+
            BOM: "\xef\xbb\xbf"
            char: CHAR1 | CHAR2 | CHAR3 | CHAR4
            CONTINUATION_BYTE: "\x80" .. "\xbf"
            CHAR1: "\x00" .. "\x7f"
            CHAR2: "\xc0" .. "\xdf" CONTINUATION_BYTE
            CHAR3: "\xe0" .. "\xef" CONTINUATION_BYTE CONTINUATION_BYTE
            CHAR4: "\xf0" .. "\xf7" CONTINUATION_BYTE CONTINUATION_BYTE CONTINUATION_BYTE
            """
            g = _Lark(g, use_bytes=True)
            s = u"🔣 地? gurīn".encode('utf-8')
            self.assertEqual(len(g.parse(s).children), 10)

            for enc, j in [("sjis", u"地球の絵はグリーンでグッド?  Chikyuu no e wa guriin de guddo"),
                           ("sjis", u"売春婦"),
                           ("euc-jp", u"乂鵬鵠")]:
                s = j.encode(enc)
                self.assertRaises(UnexpectedCharacters, g.parse, s)

        @unittest.skipIf(PARSER == 'cyk', "Takes forever")
        def test_stack_for_ebnf(self):
            """Verify that stack depth isn't an issue for EBNF grammars"""
@@ -1058,6 +1379,31 @@ def _make_parser_test(LEXER, PARSER):
            self.assertEqual( g.parse('"hello"').children, ['"hello"'])
            self.assertEqual( g.parse("'hello'").children, ["'hello'"])

        @unittest.skipIf(not Py36, "Required re syntax only exists in python3.6+")
        def test_join_regex_flags(self):
            g = r"""
                start: A
                A: B C
                B: /./s
                C: /./
            """
            g = _Lark(g)
            self.assertEqual(g.parse("  ").children,["  "])
            self.assertEqual(g.parse("\n ").children,["\n "])
            self.assertRaises(UnexpectedCharacters, g.parse, "\n\n")

            g = r"""
                start: A
                A: B | C
                B: "b"i
                C: "c"
            """
            g = _Lark(g)
            self.assertEqual(g.parse("b").children,["b"])
            self.assertEqual(g.parse("B").children,["B"])
            self.assertEqual(g.parse("c").children,["c"])
            self.assertRaises(UnexpectedCharacters, g.parse, "C")


        def test_lexer_token_limit(self):
            "Python has a stupid limit of 100 groups in a regular expression. Test that we handle this limitation"
@@ -1132,6 +1478,32 @@ def _make_parser_test(LEXER, PARSER):
            tree = l.parse('aA')
            self.assertEqual(tree.children, ['a', 'A'])

        def test_token_flags_verbose(self):
            g = _Lark(r"""start: NL | ABC
                          ABC: / [a-z] /x
                          NL: /\n/
                      """)
            x = g.parse('a')
            self.assertEqual(x.children, ['a'])

        def test_token_flags_verbose_multiline(self):
            g = _Lark(r"""start: ABC
                          ABC: /  a      b c
                               d
                                e f
                           /x
                       """)
            x = g.parse('abcdef')
            self.assertEqual(x.children, ['abcdef'])

        def test_token_multiline_only_works_with_x_flag(self):
            g = r"""start: ABC
                    ABC: /  a      b c
                              d
                                e f
                            /i
                      """
            self.assertRaises( GrammarError, _Lark, g)

        @unittest.skipIf(PARSER == 'cyk', "No empty rules")
        def test_twice_empty(self):
@@ -1784,6 +2156,23 @@ def _make_parser_test(LEXER, PARSER):
                self.assertEqual(a.line, 1)
                self.assertEqual(b.line, 2)

        @unittest.skipIf(not regex or sys.version_info[0] == 2, 'Unicode and Python 2 do not place nicely together.')
        def test_unicode_class(self):
            "Tests that character classes from the `regex` module work correctly."
            g = _Lark(r"""?start: NAME
                           NAME: ID_START ID_CONTINUE*
                           ID_START: /[\p{Lu}\p{Ll}\p{Lt}\p{Lm}\p{Lo}\p{Nl}_]+/
                           ID_CONTINUE: ID_START | /[\p{Mn}\p{Mc}\p{Nd}\p{Pc}]+/""", regex=True)

            self.assertEqual(g.parse('வணக்கம்'), 'வணக்கம்')

        @unittest.skipIf(not regex or sys.version_info[0] == 2, 'Unicode and Python 2 do not place nicely together.')
        def test_unicode_word(self):
            "Tests that a persistent bug in the `re` module works when `regex` is enabled."
            g = _Lark(r"""?start: NAME
                           NAME: /[\w]+/
                        """, regex=True)
            self.assertEqual(g.parse('வணக்கம்'), 'வணக்கம்')

    _NAME = "Test" + PARSER.capitalize() + LEXER.capitalize()
    _TestParser.__name__ = _NAME