Browse Source

Corrections to PR

tags/gm/2021-09-23T00Z/github.com--lark-parser-lark/0.10.0
Erez Sh 4 years ago
parent
commit
288078a6a0
8 changed files with 154 additions and 169 deletions
  1. +23
    -4
      docs/classes.rst
  2. +2
    -2
      docs/index.rst
  3. +23
    -7
      docs/visitors.rst
  4. +5
    -4
      lark/exceptions.py
  5. +52
    -66
      lark/lark.py
  6. +6
    -9
      lark/parsers/lalr_puppet.py
  7. +7
    -10
      lark/tree.py
  8. +36
    -67
      lark/visitors.py

+ 23
- 4
docs/classes.rst View File

@@ -7,10 +7,29 @@ Lark
.. autoclass:: lark.Lark .. autoclass:: lark.Lark
:members: open, parse, save, load :members: open, parse, save, load


LarkOptions
-----------
**Using Unicode character classes with regex**


.. autoclass:: lark.lark.LarkOptions
Python's builtin `re` module has a few persistent known bugs and also won't parse
advanced regex features such as character classes.
With `pip install lark-parser[regex]`, the `regex` module will be installed alongside `lark` and can act as a drop-in replacement to `re`.

Any instance of `Lark` instantiated with `regex=True` will now use the `regex` module instead of `re`.

For example, we can now use character classes to match PEP-3131 compliant Python identifiers.

Example:
::

from lark import Lark
>>> g = Lark(r"""
?start: NAME
NAME: ID_START ID_CONTINUE*
ID_START: /[\p{Lu}\p{Ll}\p{Lt}\p{Lm}\p{Lo}\p{Nl}_]+/
ID_CONTINUE: ID_START | /[\p{Mn}\p{Mc}\p{Nd}\p{Pc}·]+/
""", regex=True)

>>> g.parse('வணக்கம்')
'வணக்கம்'


Tree Tree
---- ----
@@ -24,7 +43,7 @@ Token


.. autoclass:: lark.Token .. autoclass:: lark.Token


Transformer, Vistor & Interpretor
Transformer, Visitor & Interpreter
--------------------------------- ---------------------------------


See :doc:`visitors`. See :doc:`visitors`.


+ 2
- 2
docs/index.rst View File

@@ -33,10 +33,10 @@ Welcome to Lark's documentation!


grammar grammar
tree_construction tree_construction
visitors
classes classes
visitors
nearley nearley




Lark is a modern parsing library for Python. Lark can parse any context-free grammar. Lark is a modern parsing library for Python. Lark can parse any context-free grammar.


+ 23
- 7
docs/visitors.rst View File

@@ -17,12 +17,33 @@ See: `visitors.py`_
Visitor Visitor
------- -------


.. autoclass:: lark.visitors.VisitorBase
Visitors visit each node of the tree, and run the appropriate method on it according to the node's data.

They work bottom-up, starting with the leaves and ending at the root of the tree.

There are two classes that implement the visitor interface:

- ``Visitor``: Visit every node (without recursion)
- ``Visitor_Recursive``: Visit every node using recursion. Slightly faster.

Example:
::

class IncreaseAllNumbers(Visitor):
def number(self, tree):
assert tree.data == "number"
tree.children[0] += 1

IncreaseAllNumbers().visit(parse_tree)


.. autoclass:: lark.visitors.Visitor .. autoclass:: lark.visitors.Visitor


.. autoclass:: lark.visitors.Visitor_Recursive .. autoclass:: lark.visitors.Visitor_Recursive


Interpreter
-----------

.. autoclass:: lark.visitors.Interpreter


Transformer Transformer
----------- -----------
@@ -30,11 +51,6 @@ Transformer
.. autoclass:: lark.visitors.Transformer .. autoclass:: lark.visitors.Transformer
:members: __default__, __default_token__ :members: __default__, __default_token__


Interpreter
-----------

.. autoclass:: lark.visitors.Interpreter

v_args v_args
------ ------


@@ -43,4 +59,4 @@ v_args
Discard Discard
------- -------


.. autoclass:: lark.visitors.Discard
.. autoclass:: lark.visitors.Discard

+ 5
- 4
lark/exceptions.py View File

@@ -26,11 +26,12 @@ class UnexpectedEOF(ParseError):
class UnexpectedInput(LarkError): class UnexpectedInput(LarkError):
"""UnexpectedInput Error. """UnexpectedInput Error.


Used as a base class for the following exceptions:

- ``UnexpectedToken``: The parser recieved an unexpected token - ``UnexpectedToken``: The parser recieved an unexpected token
- ``UnexpectedCharacters``: The lexer encountered an unexpected string - ``UnexpectedCharacters``: The lexer encountered an unexpected string


After catching one of these exceptions, you may call the following
helper methods to create a nicer error message.
After catching one of these exceptions, you may call the following helper methods to create a nicer error message.
""" """
pos_in_stream = None pos_in_stream = None


@@ -57,7 +58,7 @@ class UnexpectedInput(LarkError):
def match_examples(self, parse_fn, examples, token_type_match_fallback=False, use_accepts=False): def match_examples(self, parse_fn, examples, token_type_match_fallback=False, use_accepts=False):
"""Allows you to detect what's wrong in the input text by matching """Allows you to detect what's wrong in the input text by matching
against example errors. against example errors.
Given a parser instance and a dictionary mapping some label with Given a parser instance and a dictionary mapping some label with
some malformed syntax examples, it'll return the label for the some malformed syntax examples, it'll return the label for the
example that bests matches the current error. The function will example that bests matches the current error. The function will
@@ -66,7 +67,7 @@ class UnexpectedInput(LarkError):


For an example usage, see examples/error_reporting_lalr.py For an example usage, see examples/error_reporting_lalr.py


Args:
Parameters:
parse_fn: parse function (usually ``lark_instance.parse``) parse_fn: parse function (usually ``lark_instance.parse``)
examples: dictionary of ``{'example_string': value}``. examples: dictionary of ``{'example_string': value}``.
use_accepts: Recommended to call this with ``use_accepts=True``. use_accepts: Recommended to call this with ``use_accepts=True``.


+ 52
- 66
lark/lark.py View File

@@ -27,75 +27,67 @@ class LarkOptions(Serialize):


""" """
OPTIONS_DOC = """ OPTIONS_DOC = """
**General**
**=== General ===**
start start
The start symbol. Either a string, or a list of strings for
multiple possible starts (Default: "start")
The start symbol. Either a string, or a list of strings for multiple possible starts (Default: "start")
debug debug
Display debug information, such as warnings (default: False)
Display debug information, such as warnings (default: False)
transformer transformer
Applies the transformer to every parse tree (equivlent
to applying it after the parse, but faster)
Applies the transformer to every parse tree (equivlent to applying it after the parse, but faster)
propagate_positions propagate_positions
Propagates (line, column, end_line, end_column) attributes into all tree branches.
Propagates (line, column, end_line, end_column) attributes into all tree branches.
maybe_placeholders maybe_placeholders
When True, the ``[]`` operator returns ``None``
when not matched. When ``False``, ``[]`` behaves like the ``?``
operator, and returns no value at all. (default= ``False``. Recommended
to set to ``True``)
When True, the ``[]`` operator returns ``None`` when not matched.
When ``False``, ``[]`` behaves like the ``?`` operator, and returns no value at all.
(default= ``False``. Recommended to set to ``True``)
regex regex
When True, uses the ``regex`` module instead of the
stdlib ``re``.
When True, uses the ``regex`` module instead of the stdlib ``re``.
cache cache
Cache the results of the Lark grammar analysis, for x2 to
x3 faster loading. LALR only for now.
Cache the results of the Lark grammar analysis, for x2 to x3 faster loading. LALR only for now.


- When ``False``, does nothing (default)
- When ``True``, caches to a temporary file in the local directory
- When given a string, caches to the path pointed by the string
- When ``False``, does nothing (default)
- When ``True``, caches to a temporary file in the local directory
- When given a string, caches to the path pointed by the string


g_regex_flags g_regex_flags
Flags that are applied to all terminals (both regex and strings)
Flags that are applied to all terminals (both regex and strings)
keep_all_tokens keep_all_tokens
Prevent the tree builder from automagically removing "punctuation" tokens (default: False)
Prevent the tree builder from automagically removing "punctuation" tokens (default: False)


**Algorithm**
**=== Algorithm ===**


parser parser
Decides which parser engine to use. Accepts "earley" or "lalr".
(Default: "earley"). (there is also a "cyk" option for legacy)
Decides which parser engine to use. Accepts "earley" or "lalr". (Default: "earley").
(there is also a "cyk" option for legacy)
lexer lexer
Decides whether or not to use a lexer stage

- "auto" (default): Choose for me based on the parser
- "standard": Use a standard lexer
- "contextual": Stronger lexer (only works with parser="lalr")
- "dynamic": Flexible and powerful (only with parser="earley")
- "dynamic_complete": Same as dynamic, but tries *every* variation
of tokenizing possible.
Decides whether or not to use a lexer stage

- "auto" (default): Choose for me based on the parser
- "standard": Use a standard lexer
- "contextual": Stronger lexer (only works with parser="lalr")
- "dynamic": Flexible and powerful (only with parser="earley")
- "dynamic_complete": Same as dynamic, but tries *every* variation of tokenizing possible.
ambiguity ambiguity
Decides how to handle ambiguity in the parse. Only relevant if parser="earley"
- "resolve" - The parser will automatically choose the simplest
derivation (it chooses consistently: greedy for tokens,
non-greedy for rules)
- "explicit": The parser will return all derivations wrapped in
"_ambig" tree nodes (i.e. a forest).
Decides how to handle ambiguity in the parse. Only relevant if parser="earley"


**Domain Specific**
- "resolve" - The parser will automatically choose the simplest derivation
(it chooses consistently: greedy for tokens, non-greedy for rules)
- "explicit": The parser will return all derivations wrapped in "_ambig" tree nodes (i.e. a forest).

**=== Misc. / Domain Specific ===**


postlex postlex
Lexer post-processing (Default: None) Only works with the
standard and contextual lexers.
Lexer post-processing (Default: None) Only works with the standard and contextual lexers.
priority priority
How priorities should be evaluated - auto, none, normal, invert (Default: auto)
How priorities should be evaluated - auto, none, normal, invert (Default: auto)
lexer_callbacks lexer_callbacks
Dictionary of callbacks for the lexer. May alter tokens during lexing. Use with caution.
Dictionary of callbacks for the lexer. May alter tokens during lexing. Use with caution.
use_bytes use_bytes
Accept an input of type ``bytes`` instead of ``str`` (Python 3 only).
Accept an input of type ``bytes`` instead of ``str`` (Python 3 only).
edit_terminals edit_terminals
A callback
A callback for editing the terminals before parse.
""" """
if __doc__: if __doc__:
__doc__ += OPTIONS_DOC __doc__ += OPTIONS_DOC
@@ -170,13 +162,11 @@ class LarkOptions(Serialize):
class Lark(Serialize): class Lark(Serialize):
"""Main interface for the library. """Main interface for the library.


It's mostly a thin wrapper for the many different parsers, and for
the tree constructor.
It's mostly a thin wrapper for the many different parsers, and for the tree constructor.


Args:
grammar: a string or file-object containing the
grammar spec (using Lark's ebnf syntax)
options : a dictionary controlling various aspects of Lark.
Parameters:
grammar: a string or file-object containing the grammar spec (using Lark's ebnf syntax)
options: a dictionary controlling various aspects of Lark.


Example: Example:
>>> Lark(r'''start: "foo" ''') >>> Lark(r'''start: "foo" ''')
@@ -317,8 +307,7 @@ class Lark(Serialize):
self.save(f) self.save(f)


# TODO: merge with above # TODO: merge with above
if __init__.__doc__:
__init__.__doc__ += "\nOptions:\n" + LarkOptions.OPTIONS_DOC
__doc__ += "\nOptions:\n" + LarkOptions.OPTIONS_DOC


__serialize_fields__ = 'parser', 'rules', 'options' __serialize_fields__ = 'parser', 'rules', 'options'


@@ -391,8 +380,7 @@ class Lark(Serialize):
def open(cls, grammar_filename, rel_to=None, **options): def open(cls, grammar_filename, rel_to=None, **options):
"""Create an instance of Lark with the grammar given by its filename """Create an instance of Lark with the grammar given by its filename


If ``rel_to`` is provided, the function will find the grammar
filename in relation to it.
If ``rel_to`` is provided, the function will find the grammar filename in relation to it.


Example: Example:


@@ -426,17 +414,15 @@ class Lark(Serialize):
def parse(self, text, start=None, on_error=None): def parse(self, text, start=None, on_error=None):
"""Parse the given text, according to the options provided. """Parse the given text, according to the options provided.


If a transformer is supplied to ``__init__``, returns whatever is the
result of the transformation.

Args:
Parameters:
text (str): Text to be parsed. text (str): Text to be parsed.
start (str, optional): Required if Lark was given multiple
possible start symbols (using the start option).
on_error (function, optional): if provided, will be called on
UnexpectedToken error. Return true to resume parsing.
LALR only. See examples/error_puppet.py for an example
of how to use on_error.
start (str, optional): Required if Lark was given multiple possible start symbols (using the start option).
on_error (function, optional): if provided, will be called on UnexpectedToken error. Return true to resume parsing.
LALR only. See examples/error_puppet.py for an example of how to use on_error.

Returns:
If a transformer is supplied to ``__init__``, returns whatever is the
result of the transformation. Otherwise, returns a Tree instance.


""" """




+ 6
- 9
lark/parsers/lalr_puppet.py View File

@@ -7,11 +7,9 @@ from .. import Token




class ParserPuppet(object): class ParserPuppet(object):
"""ParserPuppet gives you advanced control over error handling when
parsing with LALR.
"""ParserPuppet gives you advanced control over error handling when parsing with LALR.


For a simpler, more streamlined interface, see the ``on_error``
argument to ``Lark.parse()``.
For a simpler, more streamlined interface, see the ``on_error`` argument to ``Lark.parse()``.
""" """
def __init__(self, parser, state_stack, value_stack, start, stream, set_state): def __init__(self, parser, state_stack, value_stack, start, stream, set_state):
self.parser = parser self.parser = parser
@@ -24,8 +22,7 @@ class ParserPuppet(object):
self.result = None self.result = None


def feed_token(self, token): def feed_token(self, token):
"""Feed the parser with a token, and advance it to the next state,
as if it recieved it from the lexer.
"""Feed the parser with a token, and advance it to the next state, as if it recieved it from the lexer.


Note that ``token`` has to be an instance of ``Token``. Note that ``token`` has to be an instance of ``Token``.
""" """
@@ -89,9 +86,9 @@ class ParserPuppet(object):
return '\n'.join(out) return '\n'.join(out)


def choices(self): def choices(self):
"""Returns a dictionary of token types, matched to their action in
the parser. Only returns token types that are accepted by the
current state.
"""Returns a dictionary of token types, matched to their action in the parser.
Only returns token types that are accepted by the current state.


Updated by ``feed_token()``. Updated by ``feed_token()``.
""" """


+ 7
- 10
lark/tree.py View File

@@ -18,15 +18,14 @@ class Meta:
class Tree(object): class Tree(object):
"""The main tree class. """The main tree class.


Creates a new tree, and stores "data" and "children" in attributes of
the same name. Trees can be hashed and compared.
Creates a new tree, and stores "data" and "children" in attributes of the same name.
Trees can be hashed and compared.


Args:
Parameters:
data: The name of the rule or alias data: The name of the rule or alias
children: List of matched sub-rules and terminals children: List of matched sub-rules and terminals
meta: Line & Column numbers (if ``propagate_positions`` is enabled). meta: Line & Column numbers (if ``propagate_positions`` is enabled).
meta attributes: line, column, start_pos, end_line,
end_column, end_pos
meta attributes: line, column, start_pos, end_line, end_column, end_pos
""" """
def __init__(self, data, children, meta=None): def __init__(self, data, children, meta=None):
self.data = data self.data = data
@@ -79,9 +78,8 @@ class Tree(object):


def iter_subtrees(self): def iter_subtrees(self):
"""Depth-first iteration. """Depth-first iteration.
Iterates over all the subtrees, never returning to the
same node twice (Lark's parse-tree is actually a DAG).

Iterates over all the subtrees, never returning to the same node twice (Lark's parse-tree is actually a DAG).
""" """
queue = [self] queue = [self]
subtrees = OrderedDict() subtrees = OrderedDict()
@@ -121,8 +119,7 @@ class Tree(object):
def iter_subtrees_topdown(self): def iter_subtrees_topdown(self):
"""Breadth-first iteration. """Breadth-first iteration.


Iterates over all the subtrees, return nodes in order like
pretty() does.
Iterates over all the subtrees, return nodes in order like pretty() does.
""" """
stack = [self] stack = [self]
while stack: while stack:


+ 36
- 67
lark/visitors.py View File

@@ -45,28 +45,23 @@ class _Decoratable:




class Transformer(_Decoratable): class Transformer(_Decoratable):
"""Transformer visit each node of the tree, and run the appropriate method
on it according to the node's data.
"""Transformers visit each node of the tree, and run the appropriate method on it according to the node's data.


Calls its methods (provided by user via inheritance) according to
``tree.data``. The returned value replaces the old one in the structure.
Calls its methods (provided by user via inheritance) according to ``tree.data``.
The returned value replaces the old one in the structure.


They work bottom-up (or depth-first), starting with the leaves and
ending at the root of the tree. Transformers can be used to
implement map & reduce patterns. Because nodes are reduced from leaf to
root, at any point the callbacks may assume the children have already been
transformed (if applicable). ``Transformer`` can do anything ``Visitor``
can do, but because it reconstructs the tree, it is slightly less
efficient.
They work bottom-up (or depth-first), starting with the leaves and ending at the root of the tree.
Transformers can be used to implement map & reduce patterns. Because nodes are reduced from leaf to root,
at any point the callbacks may assume the children have already been transformed (if applicable).

``Transformer`` can do anything ``Visitor`` can do, but because it reconstructs the tree,
it is slightly less efficient. It can be used to implement map or reduce patterns.


All these classes implement the transformer interface: All these classes implement the transformer interface:


- ``Transformer`` - Recursively transforms the tree. This is the one you
probably want.
- ``Transformer_InPlace`` - Non-recursive. Changes the tree in-place
instead of returning new instances
- ``Transformer_InPlaceRecursive`` - Recursive. Changes the tree in-place
instead of returning new instances
- ``Transformer`` - Recursively transforms the tree. This is the one you probably want.
- ``Transformer_InPlace`` - Non-recursive. Changes the tree in-place instead of returning new instances
- ``Transformer_InPlaceRecursive`` - Recursive. Changes the tree in-place instead of returning new instances


Example: Example:
:: ::
@@ -82,7 +77,7 @@ class Transformer(_Decoratable):


# Prints: Tree(a, [3]) # Prints: Tree(a, [3])


Args:
Parameters:
visit_tokens: By default, transformers only visit rules. visit_tokens: By default, transformers only visit rules.
visit_tokens=True will tell ``Transformer`` to visit tokens visit_tokens=True will tell ``Transformer`` to visit tokens
as well. This is a slightly slower alternative to lexer_callbacks as well. This is a slightly slower alternative to lexer_callbacks
@@ -164,16 +159,16 @@ class Transformer(_Decoratable):
def __default__(self, data, children, meta): def __default__(self, data, children, meta):
"""Default operation on tree (for override) """Default operation on tree (for override)


Function that is called on if a function with a corresponding name has
not been found. Defaults to reconstruct the Tree
Function that is called on if a function with a corresponding name has not been found.
Defaults to reconstruct the Tree.
""" """
return Tree(data, children, meta) return Tree(data, children, meta)


def __default_token__(self, token): def __default_token__(self, token):
"""Default operation on token (for override) """Default operation on token (for override)
Function that is called on if a function with a corresponding name has
not been found. Defaults to just return the argument.
Function that is called on if a function with a corresponding name has not been found.
Defaults to just return the argument.
""" """
return token return token


@@ -259,25 +254,6 @@ class Transformer_InPlaceRecursive(Transformer):
# Visitors # Visitors


class VisitorBase: class VisitorBase:
"""Visitors visit each node of the tree

Run the appropriate method on it according to the node's data.
They work bottom-up, starting with the leaves and ending at the root
of the tree. There are two classes that implement the visitor interface:

- ``Visitor``: Visit every node (without recursion)
- ``Visitor_Recursive``: Visit every node using recursion. Slightly faster.

Example:
::

class IncreaseAllNumbers(Visitor):
def number(self, tree):
assert tree.data == "number"
tree.children[0] += 1

IncreaseAllNumbers().visit(parse_tree)
"""
def _call_userfunc(self, tree): def _call_userfunc(self, tree):
return getattr(self, tree.data, self.__default__)(tree) return getattr(self, tree.data, self.__default__)(tree)


@@ -293,8 +269,7 @@ class Visitor(VisitorBase):
"""Bottom-up visitor, non-recursive. """Bottom-up visitor, non-recursive.


Visits the tree, starting with the leaves and finally the root (bottom-up) Visits the tree, starting with the leaves and finally the root (bottom-up)
Calls its methods (provided by user via inheritance) according to
``tree.data``
Calls its methods (provided by user via inheritance) according to ``tree.data``
""" """


def visit(self, tree): def visit(self, tree):
@@ -312,8 +287,7 @@ class Visitor_Recursive(VisitorBase):
"""Bottom-up visitor, recursive. """Bottom-up visitor, recursive.


Visits the tree, starting with the leaves and finally the root (bottom-up) Visits the tree, starting with the leaves and finally the root (bottom-up)
Calls its methods (provided by user via inheritance) according to
``tree.data``
Calls its methods (provided by user via inheritance) according to ``tree.data``
""" """


def visit(self, tree): def visit(self, tree):
@@ -348,13 +322,12 @@ class Interpreter(_Decoratable):
"""Interpreter walks the tree starting at the root. """Interpreter walks the tree starting at the root.


Visits the tree, starting with the root and finally the leaves (top-down) Visits the tree, starting with the root and finally the leaves (top-down)
Calls its methods (provided by user via inheritance) according to
``tree.data``


Unlike ``Transformer`` and ``Visitor``, the Interpreter doesn't
automatically visit its sub-branches. The user has to explicitly call ``visit``,
``visit_children``, or use the ``@visit_children_decor``. This allows the
user to implement branching and loops.
For each tree node, it calls its methods (provided by user via inheritance) according to ``tree.data``.

Unlike ``Transformer`` and ``Visitor``, the Interpreter doesn't automatically visit its sub-branches.
The user has to explicitly call ``visit``, ``visit_children``, or use the ``@visit_children_decor``.
This allows the user to implement branching and loops.


Example: Example:
:: ::
@@ -452,21 +425,17 @@ def _vargs_tree(f, data, children, meta):




def v_args(inline=False, meta=False, tree=False, wrapper=None): def v_args(inline=False, meta=False, tree=False, wrapper=None):
"""A convenience decorator factory for modifying the behavior of
user-supplied visitor methods.

By default, callback methods of transformers/visitors accept one argument -
a list of the node's children. ``v_args`` can modify this behavior. When
used on a transformer/visitor class definition, it applies to all the
callback methods inside it. Accepts one of three following flags.

Args:
inline: Children are provided as ``*args`` instead of a list
argument (not recommended for very long lists).
meta: Provides two arguments: ``children`` and ``meta`` (instead of
just the first)
tree: Provides the entire tree as the argument, instead of the
children.
"""A convenience decorator factory for modifying the behavior of user-supplied visitor methods.

By default, callback methods of transformers/visitors accept one argument - a list of the node's children.

``v_args`` can modify this behavior. When used on a transformer/visitor class definition,
it applies to all the callback methods inside it.

Parameters:
inline: Children are provided as ``*args`` instead of a list argument (not recommended for very long lists).
meta: Provides two arguments: ``children`` and ``meta`` (instead of just the first)
tree: Provides the entire tree as the argument, instead of the children.


Example: Example:
:: ::


Loading…
Cancel
Save