Browse Source

Docs: tools page + more explanations

tags/gm/2021-09-23T00Z/github.com--lark-parser-lark/0.11.3
Erez Sh 3 years ago
parent
commit
e9f4c0304a
9 changed files with 92 additions and 13 deletions
  1. +2
    -2
      docs/features.md
  2. +23
    -1
      docs/how_to_use.md
  3. +2
    -2
      docs/index.rst
  4. +23
    -1
      docs/parsers.md
  5. +2
    -0
      docs/philosophy.md
  6. +31
    -7
      docs/tools.md
  7. +3
    -0
      lark/tools/nearley.py
  8. +3
    -0
      lark/tools/serialize.py
  9. +3
    -0
      lark/tools/standalone.py

+ 2
- 2
docs/features.md View File

@@ -7,7 +7,7 @@
- Implements a parse-aware lexer that provides a better power of expression than traditional LALR implementations (such as ply).
- EBNF-inspired grammar, with extra features (See: [Grammar Reference](grammar.md))
- Builds a parse-tree (AST) automagically based on the grammar
- Stand-alone parser generator - create a small independent parser to embed in your project.
- Stand-alone parser generator - create a small independent parser to embed in your project. ([read more](tools.md))
- Flexible error handling by using an interactive parser interface (LALR only)
- Automatic line & column tracking (for both tokens and matched rules)
- Automatic terminal collision resolution
@@ -24,7 +24,7 @@

- Import rules and tokens from other Lark grammars, for code reuse and modularity.
- Support for external regex module ([see here](classes.html#using-unicode-character-classes-with-regex))
- Import grammars from Nearley.js ([read more](nearley.md))
- Import grammars from Nearley.js ([read more](tools.md))
- CYK parser
- Visualize your parse trees as dot or png files ([see_example](https://github.com/lark-parser/lark/blob/master/examples/fruitflies.py))



+ 23
- 1
docs/how_to_use.md View File

@@ -26,7 +26,7 @@ Read the tutorials to get a better understanding of how everything works. (links

Use the [Cheatsheet (PDF)](https://lark-parser.readthedocs.io/en/latest/_static/lark_cheatsheet.pdf) for quick reference.

Use the reference pages for more in-depth explanations. (links in the [main page](/index)]
Use the reference pages for more in-depth explanations. (links in the [main page](/index))

## Debug

@@ -59,3 +59,25 @@ a: "a"
'''
p = Lark(collision_grammar, parser='lalr', debug=True)
```

## Tools

### Stand-alone parser

Lark can generate a stand-alone LALR(1) parser from a grammar.

The resulting module provides the same interface as Lark, but with a fixed grammar, and reduced functionality.

Run using:

```bash
python -m lark.tools.standalone
```

For a play-by-play, read the [tutorial](http://blog.erezsh.com/create-a-stand-alone-lalr1-parser-in-python/)

### Import Nearley.js grammars

It is possible to import Nearley grammars into Lark. The Javascript code is translated using Js2Py.

Read the [reference page](nearley.md)

+ 2
- 2
docs/index.rst View File

@@ -37,7 +37,7 @@ Welcome to Lark's documentation!
classes
visitors
forest
nearley
tools



@@ -102,7 +102,7 @@ Resources
- :doc:`visitors`
- :doc:`forest`
- :doc:`classes`
- :doc:`nearley`
- :doc:`tools`
- `Cheatsheet (PDF)`_

- Discussion


+ 23
- 1
docs/parsers.md View File

@@ -42,9 +42,17 @@ Warning: This lexer can be much slower, especially for open-ended terminals such

[LALR(1)](https://www.wikiwand.com/en/LALR_parser) is a very efficient, true-and-tested parsing algorithm. It's incredibly fast and requires very little memory. It can parse most programming languages (For example: Python and Java).

LALR(1) stands for:

- Left-to-right parsing order

- Rightmost derivation, bottom-up

- Lookahead of 1 token

Lark comes with an efficient implementation that outperforms every other parsing library for Python (including PLY)

Lark extends the traditional YACC-based architecture with a *contextual lexer*, which automatically provides feedback from the parser to the lexer, making the LALR(1) algorithm stronger than ever.
Lark extends the traditional YACC-based architecture with a *contextual lexer*, which processes feedback from the parser, making the LALR(1) algorithm stronger than ever.

The contextual lexer communicates with the parser, and uses the parser's lookahead prediction to narrow its choice of terminals. So at each point, the lexer only matches the subgroup of terminals that are legal at that parser state, instead of all of the terminals. It’s surprisingly effective at resolving common terminal collisions, and allows one to parse languages that LALR(1) was previously incapable of parsing.

@@ -52,6 +60,20 @@ The contextual lexer communicates with the parser, and uses the parser's lookahe

This is an improvement to LALR(1) that is unique to Lark.

### Grammar constraints in LALR(1)

Due to having only a lookahead of one token, LALR is limited in its ability to choose between rules, when they both match the input.

Tips for writing a conforming grammar:

- Try to avoid writing different rules that can match the same sequence of characters.

- For the best performance, prefer left-recursion over right-recursion.

- Consider setting terminal priority only as a last resort.

For a better understanding of these constraints, it's recommended to learn how a SLR parser works. SLR is very similar to LALR but much simpler.

## CYK Parser

A [CYK parser](https://www.wikiwand.com/en/CYK_algorithm) can parse any context-free grammar at O(n^3*|G|).


+ 2
- 0
docs/philosophy.md View File

@@ -53,6 +53,8 @@ The Earley algorithm can accept *any* context-free grammar you throw at it (i.e.

As the users grow to understand the structure of their grammar, the scope of their target language, and their performance requirements, they may choose to switch over to LALR(1) to gain a huge performance boost, possibly at the cost of some language features.

Both Earley and LALR(1) can use the same grammar, as long as all constraints are satisfied.

In short, "Premature optimization is the root of all evil."

### Other design features


docs/nearley.md → docs/tools.md View File

@@ -1,26 +1,49 @@
# Importing grammars from Nearley
# Tools (Stand-alone, Nearley)

## Stand-alone parser

Lark can generate a stand-alone LALR(1) parser from a grammar.

The resulting module provides the same interface as Lark, but with a fixed grammar, and reduced functionality.

Run using:

```bash
python -m lark.tools.standalone
```

For a play-by-play, read the [tutorial](http://blog.erezsh.com/create-a-stand-alone-lalr1-parser-in-python/)


## Importing grammars from Nearley.js

Lark comes with a tool to convert grammars from [Nearley](https://github.com/Hardmath123/nearley), a popular Earley library for Javascript. It uses [Js2Py](https://github.com/PiotrDabkowski/Js2Py) to convert and run the Javascript postprocessing code segments.

## Requirements
#### Requirements

1. Install Lark with the `nearley` component:
```bash
pip install lark-parser[nearley]
```

2. Acquire a copy of the nearley codebase. This can be done using:
2. Acquire a copy of the Nearley codebase. This can be done using:
```bash
git clone https://github.com/Hardmath123/nearley
```

## Usage
#### Usage

The tool can be run using:

```bash
python -m lark.tools.nearley <grammar.ne> <start_rule> <path_to_nearley_repo>
```

Here's an example of how to import nearley's calculator example into Lark:

```bash
git clone https://github.com/Hardmath123/nearley
python -m lark.tools.nearley nearley/examples/calculator/arithmetic.ne main nearley > ncalc.py
python -m lark.tools.nearley nearley/examples/calculator/arithmetic.ne main ./nearley > ncalc.py
```

You can use the output as a regular python module:
@@ -38,10 +61,11 @@ git clone https://github.com/Hardmath123/nearley
python -m lark.tools.nearley nearley/examples/calculator/arithmetic.ne main nearley --es6 > ncalc.py
```

## Notes
#### Notes

- Lark currently cannot import templates from Nearley

- Lark currently cannot export grammars to Nearley

These might get added in the future, if enough users ask for them.
These might get added in the future, if enough users ask for them.


+ 3
- 0
lark/tools/nearley.py View File

@@ -194,5 +194,8 @@ def get_arg_parser():

if __name__ == '__main__':
parser = get_arg_parser()
if len(sys.argv)==1:
parser.print_help(sys.stderr)
sys.exit(1)
args = parser.parse_args()
print(main(fn=args.nearley_grammar, start=args.start_rule, nearley_lib=args.nearley_lib, es6=args.es6))

+ 3
- 0
lark/tools/serialize.py View File

@@ -23,6 +23,9 @@ def serialize(lark_inst, outfile):


def main():
if len(sys.argv)==1:
argparser.print_help(sys.stderr)
sys.exit(1)
ns = argparser.parse_args()
serialize(*build_lalr(ns))



+ 3
- 0
lark/tools/standalone.py View File

@@ -181,6 +181,9 @@ def main():
parents=[lalr_argparser], epilog='Look at the Lark documentation for more info on the options')
parser.add_argument("old_start", nargs='?', help=SUPPRESS)
parser.add_argument('-c', '--compress', action='store_true', default=0, help="Enable compression")
if len(sys.argv)==1:
parser.print_help(sys.stderr)
sys.exit(1)
ns = parser.parse_args()
if ns.old_start is not None:
warn('The syntax `python -m lark.tools.standalone <grammar-file> <start>` is deprecated. Use the -s option')


Loading…
Cancel
Save