Browse Source

Improved docs (WIP)

tags/gm/2021-09-23T00Z/github.com--lark-parser-lark/0.6.6
Erez Shinan 6 years ago
parent
commit
7ba0e05099
6 changed files with 65 additions and 41 deletions
  1. +1
    -1
      README.md
  2. +5
    -1
      docs/classes.md
  3. +6
    -36
      docs/features.md
  4. +49
    -0
      docs/parsers.md
  5. +3
    -3
      docs/philosophy.md
  6. +1
    -0
      mkdocs.yml

+ 1
- 1
README.md View File

@@ -8,7 +8,7 @@ Parse any context-free grammar, FAST and EASY!


Lark can: Lark can:


- Parse all context-free grammars, and handle all ambiguity
- Parse all context-free grammars, and handle any ambiguity
- Build a parse-tree automagically, no construction code required - Build a parse-tree automagically, no construction code required
- Outperform all other Python libraries when using LALR(1) (Yes, including PLY) - Outperform all other Python libraries when using LALR(1) (Yes, including PLY)
- Run on every Python interpreter (it's pure-python) - Run on every Python interpreter (it's pure-python)


+ 5
- 1
docs/classes.md View File

@@ -76,10 +76,14 @@ Returns all nodes of the tree whose data equals the given data.


#### iter_subtrees(self) #### iter_subtrees(self)


Depth-first iteration.

Iterates over all the subtrees, never returning to the same node twice (Lark's parse-tree is actually a DAG). Iterates over all the subtrees, never returning to the same node twice (Lark's parse-tree is actually a DAG).


#### iter_subtrees_topdown(self) #### iter_subtrees_topdown(self)


Breadth-first iteration.

Iterates over all the subtrees, return nodes in order like pretty() does. Iterates over all the subtrees, return nodes in order like pretty() does.


#### \_\_eq\_\_, \_\_hash\_\_ #### \_\_eq\_\_, \_\_hash\_\_
@@ -122,7 +126,7 @@ There are two classes that implement the visitor interface:


Transformers visit each node of the tree, and run the appropriate method on it according to the node's data. Transformers visit each node of the tree, and run the appropriate method on it according to the node's data.


They work bottom-up, starting with the leaves and ending at the root of the tree.
They work bottom-up (or: depth-first), starting with the leaves and ending at the root of the tree.


Transformers can be used to implement map & reduce patterns. Transformers can be used to implement map & reduce patterns.




+ 6
- 36
docs/features.md View File

@@ -1,5 +1,8 @@
# Main Features # Main Features

- Earley parser, capable of parsing any context-free grammar
- Implements SPPF, for efficient parsing and storing of ambiguous grammars.
- LALR(1) parser, limited in power of expression, but efficient in space and performance (O(n)).
- Implements a parse-aware lexer that provides a better power of expression than traditional implementations.
- EBNF-inspired grammar, with extra features (See: [Grammar Reference](grammar.md)) - EBNF-inspired grammar, with extra features (See: [Grammar Reference](grammar.md))
- Builds a parse-tree (AST) automagically based on the grammar - Builds a parse-tree (AST) automagically based on the grammar
- Stand-alone parser generator - create a small independent parser to embed in your project. - Stand-alone parser generator - create a small independent parser to embed in your project.
@@ -11,46 +14,13 @@
- Python 2 & Python 3 compatible - Python 2 & Python 3 compatible
- Pure-Python implementation - Pure-Python implementation


## Parsers

Lark implements the following parsing algorithms:

### Earley

An [Earley Parser](https://www.wikiwand.com/en/Earley_parser) is a chart parser capable of parsing any context-free grammar at O(n^3), and O(n^2) when the grammar is unambiguous. It can parse most LR grammars at O(n). Most programming languages are LR, and can be parsed at a linear time.

Lark's Earley implementation runs on top of a skipping chart parser, which allows it to use regular expressions, instead of matching characters one-by-one. This is a huge improvement to Earley that is unique to Lark. This feature is used by default, but can also be requested explicitely using `lexer='dynamic'`.

It's possible to bypass the dynamic lexer, and use the regular Earley parser with a traditional lexer, that tokenizes as an independant first step. Doing so will provide a speed benefit, but will tokenize without using Earley's ambiguity-resolution ability. So choose this only if you know why! Activate with `lexer='standard'`

**Note on ambiguity**

Lark by default can handle any ambiguity in the grammar (Earley+dynamic). The user may request to recieve all derivations (using ambiguity='explicit'), or let Lark automatically choose the most fitting derivation (default behavior).

Lark also supports user-defined rule priority to steer the automatic ambiguity resolution.

### LALR(1)

[LALR(1)](https://www.wikiwand.com/en/LALR_parser) is a very efficient, true-and-tested parsing algorithm. It's incredibly fast and requires very little memory. It can parse most programming languages (For example: Python and Java).

Lark comes with an efficient implementation that outperforms every other parsing library for Python (including PLY)

Lark extends the traditional YACC-based architecture with a *contextual lexer*, which automatically provides feedback from the parser to the lexer, making the LALR(1) algorithm stronger than ever.

The contextual lexer communicates with the parser, and uses the parser's lookahead prediction to narrow its choice of tokens. So at each point, the lexer only matches the subgroup of terminals that are legal at that parser state, instead of all of the terminals. It’s surprisingly effective at resolving common terminal collisions, and allows to parse languages that LALR(1) was previously incapable of parsing.

This is an improvement to LALR(1) that is unique to Lark.

### CYK Parser

A [CYK parser](https://www.wikiwand.com/en/CYK_algorithm) can parse any context-free grammar at O(n^3*|G|).

Its too slow to be practical for simple grammars, but it offers good performance for highly ambiguous grammars.
[Read more about the parsers](parsers.md)


# Extra features # Extra features


- Import rules and tokens from other Lark grammars, for code reuse and modularity. - Import rules and tokens from other Lark grammars, for code reuse and modularity.
- Import grammars from Nearley.js - Import grammars from Nearley.js
- CYK parser


### Experimental features ### Experimental features
- Automatic reconstruction of input from parse-tree (see examples) - Automatic reconstruction of input from parse-tree (see examples)


+ 49
- 0
docs/parsers.md View File

@@ -0,0 +1,49 @@

Lark implements the following parsing algorithms: Earley, LALR(1), and CYK

# Earley

An [Earley Parser](https://www.wikiwand.com/en/Earley_parser) is a chart parser capable of parsing any context-free grammar at O(n^3), and O(n^2) when the grammar is unambiguous. It can parse most LR grammars at O(n). Most programming languages are LR, and can be parsed at a linear time.

Lark's Earley implementation runs on top of a skipping chart parser, which allows it to use regular expressions, instead of matching characters one-by-one. This is a huge improvement to Earley that is unique to Lark. This feature is used by default, but can also be requested explicitely using `lexer='dynamic'`.

It's possible to bypass the dynamic lexer, and use the regular Earley parser with a traditional lexer, that tokenizes as an independant first step. Doing so will provide a speed benefit, but will tokenize without using Earley's ambiguity-resolution ability. So choose this only if you know why! Activate with `lexer='standard'`

**SPPF & Ambiguity resolution**

Lark implements the Shared Packed Parse Forest data-structure for the Earley parser, in order to reduce the space and computation required to handle ambiguous grammars.

You can read more about SPPF [here](http://www.bramvandersanden.com/post/2014/06/shared-packed-parse-forest/)

As a result, Lark can efficiently parse and store every ambiguity in the grammar, when using Earley.

Lark provides the following options to combat ambiguity:

1) Lark will choose the best derivation for you (default). Users can choose between different disambiguation strategies, and can prioritize (or demote) individual rules over others, using the rule-priority syntax.

2) Users may choose to recieve the set of all possible parse-trees (using ambiguity='explicit'), and choose the best derivation themselves. While simple and flexible, it comes at the cost of space and performance, and so it isn't recommended for highly ambiguous grammars, or very long inputs.

3) As an advanced feature, users may use specialized visitors to iterate the SPPF themselves. Future versions of Lark intend to improve and simplify this interface.


**dynamic_complete**

**TODO: Add documentation on dynamic_complete**

# LALR(1)

[LALR(1)](https://www.wikiwand.com/en/LALR_parser) is a very efficient, true-and-tested parsing algorithm. It's incredibly fast and requires very little memory. It can parse most programming languages (For example: Python and Java).

Lark comes with an efficient implementation that outperforms every other parsing library for Python (including PLY)

Lark extends the traditional YACC-based architecture with a *contextual lexer*, which automatically provides feedback from the parser to the lexer, making the LALR(1) algorithm stronger than ever.

The contextual lexer communicates with the parser, and uses the parser's lookahead prediction to narrow its choice of tokens. So at each point, the lexer only matches the subgroup of terminals that are legal at that parser state, instead of all of the terminals. It’s surprisingly effective at resolving common terminal collisions, and allows to parse languages that LALR(1) was previously incapable of parsing.

This is an improvement to LALR(1) that is unique to Lark.

# CYK Parser

A [CYK parser](https://www.wikiwand.com/en/CYK_algorithm) can parse any context-free grammar at O(n^3*|G|).

Its too slow to be practical for simple grammars, but it offers good performance for highly ambiguous grammars.

+ 3
- 3
docs/philosophy.md View File

@@ -2,7 +2,7 @@


Parsers are innately complicated and confusing. They're difficult to understand, difficult to write, and difficult to use. Even experts on the subject can become baffled by the nuances of these complicated state-machines. Parsers are innately complicated and confusing. They're difficult to understand, difficult to write, and difficult to use. Even experts on the subject can become baffled by the nuances of these complicated state-machines.


Lark's mission is to make the process of writing them as simple and abstract as possible. by the following design principles:
Lark's mission is to make the process of writing them as simple and abstract as possible, by following these design principles:


### Design Principles ### Design Principles


@@ -49,9 +49,9 @@ To improve performance, you can skip building the tree for LALR(1), by providing


### 3. Earley is the default ### 3. Earley is the default


The Earley algorithm can accept *any* context-free grammar you throw at it (i.e. any grammar you can write in EBNF, it can parse). That makes it extremely useful for beginners, who are not aware of the strange and arbitrary restrictions that LALR(1) places on its grammars.
The Earley algorithm can accept *any* context-free grammar you throw at it (i.e. any grammar you can write in EBNF, it can parse). That makes it extremely friendly to beginners, who are not aware of the strange and arbitrary restrictions that LALR(1) places on its grammars.


As the users grow to understand the structure of their grammar, the scope of their target language and their performance requirements, they may choose to switch over to LALR(1) to gain a huge performance boost, possibly at the cost of some language features.
As the users grow to understand the structure of their grammar, the scope of their target language, and their performance requirements, they may choose to switch over to LALR(1) to gain a huge performance boost, possibly at the cost of some language features.


In short, "Premature optimization is the root of all evil." In short, "Premature optimization is the root of all evil."




+ 1
- 0
mkdocs.yml View File

@@ -4,6 +4,7 @@ pages:
- Main Page: index.md - Main Page: index.md
- Philosophy: philosophy.md - Philosophy: philosophy.md
- Features: features.md - Features: features.md
- Parsers: parsers.md
- How To Use (Guide): how_to_use.md - How To Use (Guide): how_to_use.md
- Grammar Reference: grammar.md - Grammar Reference: grammar.md
- Tree Construction Reference: tree_construction.md - Tree Construction Reference: tree_construction.md


Loading…
Cancel
Save