Переглянути джерело

Nicer and more correct documentation

tags/gm/2021-09-23T00Z/github.com--lark-parser-lark/0.8.3
Erez Sh 4 роки тому
джерело
коміт
acde77eef8
2 змінених файлів з 114 додано та 46 видалено
  1. +63
    -18
      docs/classes.md
  2. +51
    -28
      lark/lark.py

+ 63
- 18
docs/classes.md Переглянути файл

@@ -8,41 +8,86 @@ This page details the important classes in Lark.

The Lark class is the main interface for the library. It's mostly a thin wrapper for the many different parsers, and for the tree constructor.

#### \_\_init\_\_(self, grammar, **options)
#### \_\_init\_\_(self, grammar_string, **options)

The Lark class accepts a grammar string or file object, and keyword options:
Creates an instance of Lark with the given grammar

* **start** - A list of the rules in the grammar that begin the parse (Default: `["start"]`)
#### open(cls, grammar_filename, rel_to=None, **options)

* **parser** - Decides which parser engine to use, "earley", "lalr" or "cyk". (Default: `"earley"`)
Creates an instance of Lark with the grammar given by its filename

* **lexer** - Overrides default lexer, depending on parser.
If rel_to is provided, the function will find the grammar filename in relation to it.

* **transformer** - Applies the provided transformer instead of building a parse tree (only allowed with parser="lalr")
Example:

* **postlex** - Lexer post-processing (Default: `None`. only works when lexer is "standard" or "contextual")
```python
>>> Lark.open("grammar_file.lark", rel_to=__file__, parser="lalr")
Lark(...)
```

* **ambiguity** (only relevant for earley and cyk)
#### parse(self, text)

* "explicit" - Return all derivations inside an "_ambig" data node.
Return a complete parse tree for the text (of type Tree)

* "resolve" - Let the parser choose the best derivation (greedy for tokens, non-greedy for rules. Default)
If a transformer is supplied to `__init__`, returns whatever is the result of the transformation.

* **debug** - Display warnings (such as Shift-Reduce warnings for LALR)

* **keep_all_tokens** - Don't throw away any terminals from the tree (Default=`False`)
#### save(self, f) / load(cls, f)

* **propagate_positions** - Propagate line/column count to tree nodes, at the cost of performance (default=`False`)
Useful for caching and multiprocessing.

* **maybe_placeholders** - When True, the `[]` operator returns `None` when not matched. When `False`, `[]` behaves like the `?` operator, and return no value at all, which may be a little faster (default=`False`)
`save` saves the instance into the given file object

* **lexer_callbacks** - A dictionary of callbacks of type f(Token) -> Token, used to interface with the lexer Token generation. Only works with the standard and contextual lexers. See [Recipes](recipes.md) for more information.
`load` loads an instance from the given file object

#### parse(self, text)
####

Return a complete parse tree for the text (of type Tree)

If a transformer is supplied to `__init__`, returns whatever is the result of the transformation.
### Lark Options
#### General options

**start** - The start symbol. Either a string, or a list of strings for multiple possible starts (Default: "start")

**debug** - Display debug information, such as warnings (default: False)

**transformer** - Applies the transformer to every parse tree (equivlent to applying it after the parse, but faster)

**propagate_positions** - Propagates (line, column, end_line, end_column) attributes into all tree branches.

**maybe_placeholders** -
- When True, the `[]` operator returns `None` when not matched.
- When `False`, `[]` behaves like the `?` operator, and returns no value at all.
- (default=`False`. Recommended to set to `True`)

**g_regex_flags** - Flags that are applied to all terminals (both regex and strings)

**keep_all_tokens** - Prevent the tree builder from automagically removing "punctuation" tokens (default: False)

**cache_grammar** - Cache the Lark grammar (Default: False)

#### Algorithm

**parser** - Decides which parser engine to use, "earley" or "lalr". (Default: "earley")
(there is also a "cyk" option for legacy)

**lexer** - Decides whether or not to use a lexer stage

- "auto" (default): Choose for me based on the parser
- "standard": Use a standard lexer
- "contextual": Stronger lexer (only works with parser="lalr")
- "dynamic": Flexible and powerful (only with parser="earley")
- "dynamic_complete": Same as dynamic, but tries *every* variation of tokenizing possible. (only with parser="earley")

**ambiguity** - Decides how to handle ambiguity in the parse. Only relevant if parser="earley"
- "resolve": The parser will automatically choose the simplest derivation (it chooses consistently: greedy for tokens, non-greedy for rules)
- "explicit": The parser will return all derivations wrapped in "_ambig" tree nodes (i.e. a forest).

#### Domain Specific

- **postlex** - Lexer post-processing (Default: None) Only works with the standard and contextual lexers.
- **priority** - How priorities should be evaluated - auto, none, normal, invert (Default: auto)
- **lexer_callbacks** - Dictionary of callbacks for the lexer. May alter tokens during lexing. Use with caution.
- **edit_terminals** - A callback

----



+ 51
- 28
lark/lark.py Переглянути файл

@@ -22,33 +22,56 @@ class LarkOptions(Serialize):

"""
OPTIONS_DOC = """
parser - Decides which parser engine to use, "earley" or "lalr". (Default: "earley")
Note: "lalr" requires a lexer

lexer - Decides whether or not to use a lexer stage
"standard": Use a standard lexer
"contextual": Stronger lexer (only works with parser="lalr")
"dynamic": Flexible and powerful (only with parser="earley")
"dynamic_complete": Same as dynamic, but tries *every* variation
of tokenizing possible. (only with parser="earley")
"auto" (default): Choose for me based on grammar and parser

ambiguity - Decides how to handle ambiguity in the parse. Only relevant if parser="earley"
"resolve": The parser will automatically choose the simplest derivation
(it chooses consistently: greedy for tokens, non-greedy for rules)
"explicit": The parser will return all derivations wrapped in "_ambig" tree nodes (i.e. a forest).

transformer - Applies the transformer to every parse tree
debug - Affects verbosity (default: False)
keep_all_tokens - Don't automagically remove "punctuation" tokens (default: False)
cache_grammar - Cache the Lark grammar (Default: False)
postlex - Lexer post-processing (Default: None) Only works with the standard and contextual lexers.
start - The start symbol, either a string, or a list of strings for multiple possible starts (Default: "start")
priority - How priorities should be evaluated - auto, none, normal, invert (Default: auto)
propagate_positions - Propagates [line, column, end_line, end_column] attributes into all tree branches.
lexer_callbacks - Dictionary of callbacks for the lexer. May alter tokens during lexing. Use with caution.
maybe_placeholders - Experimental feature. Instead of omitting optional rules (i.e. rule?), replace them with None
g_regex_flags - Flags that are applied to all Terminals (Regex and Strings)
# General

start - The start symbol. Either a string, or a list of strings for
multiple possible starts (Default: "start")
debug - Display debug information, such as warnings (default: False)
transformer - Applies the transformer to every parse tree (equivlent to
applying it after the parse, but faster)
propagate_positions - Propagates (line, column, end_line, end_column)
attributes into all tree branches.
maybe_placeholders - When True, the `[]` operator returns `None` when not matched.
When `False`, `[]` behaves like the `?` operator,
and returns no value at all.
(default=`False`. Recommended to set to `True`)
cache_grammar - Cache the Lark grammar (Default: False)
g_regex_flags - Flags that are applied to all terminals
(both regex and strings)
keep_all_tokens - Prevent the tree builder from automagically
removing "punctuation" tokens (default: False)

# Algorithm

parser - Decides which parser engine to use
Accepts "earley" or "lalr". (Default: "earley")
(there is also a "cyk" option for legacy)

lexer - Decides whether or not to use a lexer stage
"auto" (default): Choose for me based on the parser
"standard": Use a standard lexer
"contextual": Stronger lexer (only works with parser="lalr")
"dynamic": Flexible and powerful (only with parser="earley")
"dynamic_complete": Same as dynamic, but tries *every* variation
of tokenizing possible.

ambiguity - Decides how to handle ambiguity in the parse.
Only relevant if parser="earley"
"resolve": The parser will automatically choose the simplest
derivation (it chooses consistently: greedy for
tokens, non-greedy for rules)
"explicit": The parser will return all derivations wrapped
in "_ambig" tree nodes (i.e. a forest).

# Domain Specific

postlex - Lexer post-processing (Default: None) Only works with the
standard and contextual lexers.
priority - How priorities should be evaluated - auto, none, normal,
invert (Default: auto)
lexer_callbacks - Dictionary of callbacks for the lexer. May alter
tokens during lexing. Use with caution.
edit_terminals - A callback
"""
if __doc__:
__doc__ += OPTIONS_DOC
@@ -219,7 +242,7 @@ class Lark(Serialize):
self.lexer = self._build_lexer()

if __init__.__doc__:
__init__.__doc__ += "\nOPTIONS:" + LarkOptions.OPTIONS_DOC
__init__.__doc__ += "\nOptions:\n" + LarkOptions.OPTIONS_DOC

__serialize_fields__ = 'parser', 'rules', 'options'



Завантаження…
Відмінити
Зберегти