Browse Source

Improve docs

tags/gm/2021-09-23T00Z/github.com--lark-parser-lark/0.11.0
Erez Sh 4 years ago
parent
commit
47df8b5a6a
2 changed files with 15 additions and 4 deletions
  1. +14
    -3
      docs/parsers.md
  2. +1
    -1
      lark/parsers/earley.py

+ 14
- 3
docs/parsers.md View File

@@ -25,9 +25,18 @@ Lark provides the following options to combat ambiguity:

3) As an advanced feature, users may use specialized visitors to iterate the SPPF themselves.

**dynamic_complete**
**lexer="dynamic_complete"**

Earley's "dynamic" lexer uses regular expressions in order to tokenize the text. It tries every possible combination of terminals, but it matches each terminal exactly once, returning the longest possible match.

That means, for example, that when `lexer="dynamic"` (which is the default), the terminal `/a+/`, when given the text `"aa"`, will return one result, `aa`, even though `a` would also be correct.

This behavior was chosen because it is much faster, and it is usually what you would expect.

Setting `lexer="dynamic_complete"` instructs the lexer to consider every possible regexp match. This ensures that the parser will consider and resolve every ambiguity, even inside the terminals themselves. This lexer provides the same capabilities as scannerless Earley, but with different performance tradeoffs.

Warning: This lexer can be much slower, especially for open-ended terminals such as `/.*/`

**TODO: Add documentation on dynamic_complete**

## LALR(1)

@@ -37,7 +46,9 @@ Lark comes with an efficient implementation that outperforms every other parsing

Lark extends the traditional YACC-based architecture with a *contextual lexer*, which automatically provides feedback from the parser to the lexer, making the LALR(1) algorithm stronger than ever.

The contextual lexer communicates with the parser, and uses the parser's lookahead prediction to narrow its choice of tokens. So at each point, the lexer only matches the subgroup of terminals that are legal at that parser state, instead of all of the terminals. It’s surprisingly effective at resolving common terminal collisions, and allows one to parse languages that LALR(1) was previously incapable of parsing.
The contextual lexer communicates with the parser, and uses the parser's lookahead prediction to narrow its choice of terminals. So at each point, the lexer only matches the subgroup of terminals that are legal at that parser state, instead of all of the terminals. It’s surprisingly effective at resolving common terminal collisions, and allows one to parse languages that LALR(1) was previously incapable of parsing.

(If you're familiar with YACC, you can think of it as automatic lexer-states)

This is an improvement to LALR(1) that is unique to Lark.



+ 1
- 1
lark/parsers/earley.py View File

@@ -1,4 +1,4 @@
"""This module implements an scanerless Earley parser.
"""This module implements an Earley parser.

The core Earley algorithm used here is based on Elizabeth Scott's implementation, here:
https://www.sciencedirect.com/science/article/pii/S1571066108001497


Loading…
Cancel
Save