Docs: tools page + more explanations

3 lat temu · e9f4c0304a
--- a/docs/features.md
+++ b/docs/features.md
@@ -7,7 +7,7 @@
   - Implements a parse-aware lexer that provides a better power of expression than traditional LALR implementations (such as ply).
 - EBNF-inspired grammar, with extra features (See: [Grammar Reference](grammar.md))
 - Builds a parse-tree (AST) automagically based on the grammar
 - Stand-alone parser generator - create a small independent parser to embed in your project.
 - Stand-alone parser generator - create a small independent parser to embed in your project. ([read more](tools.md))
 - Flexible error handling by using an interactive parser interface (LALR only)
 - Automatic line & column tracking (for both tokens and matched rules)
 - Automatic terminal collision resolution
@@ -24,7 +24,7 @@

  - Import rules and tokens from other Lark grammars, for code reuse and modularity.
  - Support for external regex module ([see here](classes.html#using-unicode-character-classes-with-regex))
  - Import grammars from Nearley.js ([read more](nearley.md))
  - Import grammars from Nearley.js ([read more](tools.md))
  - CYK parser
  - Visualize your parse trees as dot or png files ([see_example](https://github.com/lark-parser/lark/blob/master/examples/fruitflies.py))

--- a/docs/how_to_use.md
+++ b/docs/how_to_use.md
@@ -26,7 +26,7 @@ Read the tutorials to get a better understanding of how everything works. (links

 Use the [Cheatsheet (PDF)](https://lark-parser.readthedocs.io/en/latest/_static/lark_cheatsheet.pdf) for quick reference.

 Use the reference pages for more in-depth explanations. (links in the [main page](/index)]
 Use the reference pages for more in-depth explanations. (links in the [main page](/index))

 ## Debug

@@ -59,3 +59,25 @@ a: "a"
 '''
 p = Lark(collision_grammar, parser='lalr', debug=True)
 ```

 ## Tools

 ### Stand-alone parser

 Lark can generate a stand-alone LALR(1) parser from a grammar.

 The resulting module provides the same interface as Lark, but with a fixed grammar, and reduced functionality.

 Run using:

 ```bash
 python -m lark.tools.standalone
 ```

 For a play-by-play, read the [tutorial](http://blog.erezsh.com/create-a-stand-alone-lalr1-parser-in-python/)

 ### Import Nearley.js grammars

 It is possible to import Nearley grammars into Lark. The Javascript code is translated using Js2Py.

 Read the [reference page](nearley.md)
--- a/docs/index.rst
+++ b/docs/index.rst
@@ -37,7 +37,7 @@ Welcome to Lark's documentation!
   classes
   visitors
   forest
   nearley
   tools



@@ -102,7 +102,7 @@ Resources
   -  :doc:`visitors`
   -  :doc:`forest`
   -  :doc:`classes`
   -  :doc:`nearley`
   -  :doc:`tools`
   -  `Cheatsheet (PDF)`_

 -  Discussion
--- a/docs/parsers.md
+++ b/docs/parsers.md
@@ -42,9 +42,17 @@ Warning: This lexer can be much slower, especially for open-ended terminals such

 [LALR(1)](https://www.wikiwand.com/en/LALR_parser) is a very efficient, true-and-tested parsing algorithm. It's incredibly fast and requires very little memory. It can parse most programming languages (For example: Python and Java).

 LALR(1) stands for:

 - Left-to-right parsing order

 - Rightmost derivation, bottom-up

 - Lookahead of 1 token

 Lark comes with an efficient implementation that outperforms every other parsing library for Python (including PLY)

 Lark extends the traditional YACC-based architecture with a *contextual lexer*, which automatically provides feedback from the parser to the lexer, making the LALR(1) algorithm stronger than ever.
 Lark extends the traditional YACC-based architecture with a *contextual lexer*, which processes feedback from the parser, making the LALR(1) algorithm stronger than ever.

 The contextual lexer communicates with the parser, and uses the parser's lookahead prediction to narrow its choice of terminals. So at each point, the lexer only matches the subgroup of terminals that are legal at that parser state, instead of all of the terminals. It’s surprisingly effective at resolving common terminal collisions, and allows one to parse languages that LALR(1) was previously incapable of parsing.

@@ -52,6 +60,20 @@ The contextual lexer communicates with the parser, and uses the parser's lookahe

 This is an improvement to LALR(1) that is unique to Lark.

 ### Grammar constraints in LALR(1)

 Due to having only a lookahead of one token, LALR is limited in its ability to choose between rules, when they both match the input.

 Tips for writing a conforming grammar:

 - Try to avoid writing different rules that can match the same sequence of characters.

 - For the best performance, prefer left-recursion over right-recursion.

 - Consider setting terminal priority only as a last resort.

 For a better understanding of these constraints, it's recommended to learn how a SLR parser works. SLR is very similar to LALR but much simpler.

 ## CYK Parser

 A [CYK parser](https://www.wikiwand.com/en/CYK_algorithm) can parse any context-free grammar at O(n^3*|G|).
--- a/docs/philosophy.md
+++ b/docs/philosophy.md
@@ -53,6 +53,8 @@ The Earley algorithm can accept *any* context-free grammar you throw at it (i.e.

 As the users grow to understand the structure of their grammar, the scope of their target language, and their performance requirements, they may choose to switch over to LALR(1) to gain a huge performance boost, possibly at the cost of some language features.

 Both Earley and LALR(1) can use the same grammar, as long as all constraints are satisfied.

 In short, "Premature optimization is the root of all evil."

 ### Other design features
--- a/docs/nearley.md
+++ b/docs/nearley.md
@@ -1,26 +1,49 @@
 # Importing grammars from Nearley
 # Tools (Stand-alone, Nearley)

 ## Stand-alone parser

 Lark can generate a stand-alone LALR(1) parser from a grammar.

 The resulting module provides the same interface as Lark, but with a fixed grammar, and reduced functionality.

 Run using:

 ```bash
 python -m lark.tools.standalone
 ```

 For a play-by-play, read the [tutorial](http://blog.erezsh.com/create-a-stand-alone-lalr1-parser-in-python/)


 ## Importing grammars from Nearley.js

 Lark comes with a tool to convert grammars from [Nearley](https://github.com/Hardmath123/nearley), a popular Earley library for Javascript. It uses [Js2Py](https://github.com/PiotrDabkowski/Js2Py) to convert and run the Javascript postprocessing code segments.

 ## Requirements
 #### Requirements

 1. Install Lark with the `nearley` component:
 ```bash
 pip install lark-parser[nearley]
 ```

 2. Acquire a copy of the nearley codebase. This can be done using:
 2. Acquire a copy of the Nearley codebase. This can be done using:
 ```bash
 git clone https://github.com/Hardmath123/nearley
 ```

 ## Usage
 #### Usage

 The tool can be run using:

 ```bash
 python -m lark.tools.nearley <grammar.ne> <start_rule> <path_to_nearley_repo>
 ```

 Here's an example of how to import nearley's calculator example into Lark:

 ```bash
 git clone https://github.com/Hardmath123/nearley
 python -m lark.tools.nearley nearley/examples/calculator/arithmetic.ne main nearley > ncalc.py
 python -m lark.tools.nearley nearley/examples/calculator/arithmetic.ne main ./nearley > ncalc.py
 ```

 You can use the output as a regular python module:
@@ -38,10 +61,11 @@ git clone https://github.com/Hardmath123/nearley
 python -m lark.tools.nearley nearley/examples/calculator/arithmetic.ne main nearley --es6 > ncalc.py
 ```

 ## Notes
 #### Notes

 - Lark currently cannot import templates from Nearley

 - Lark currently cannot export grammars to Nearley

 These might get added in the future, if enough users ask for them.
 These might get added in the future, if enough users ask for them.

--- a/lark/tools/nearley.py
+++ b/lark/tools/nearley.py
@@ -194,5 +194,8 @@ def get_arg_parser():

 if __name__ == '__main__':
    parser = get_arg_parser()
    if len(sys.argv)==1:
        parser.print_help(sys.stderr)
        sys.exit(1)
    args = parser.parse_args()
    print(main(fn=args.nearley_grammar, start=args.start_rule, nearley_lib=args.nearley_lib, es6=args.es6))
--- a/lark/tools/serialize.py
+++ b/lark/tools/serialize.py
@@ -23,6 +23,9 @@ def serialize(lark_inst, outfile):


 def main():
    if len(sys.argv)==1:
        argparser.print_help(sys.stderr)
        sys.exit(1)
    ns = argparser.parse_args()
    serialize(*build_lalr(ns))

--- a/lark/tools/standalone.py
+++ b/lark/tools/standalone.py
@@ -181,6 +181,9 @@ def main():
                            parents=[lalr_argparser], epilog='Look at the Lark documentation for more info on the options')
    parser.add_argument("old_start", nargs='?', help=SUPPRESS)
    parser.add_argument('-c', '--compress', action='store_true', default=0, help="Enable compression")
    if len(sys.argv)==1:
        parser.print_help(sys.stderr)
        sys.exit(1)
    ns = parser.parse_args()
    if ns.old_start is not None:
        warn('The syntax `python -m lark.tools.standalone <grammar-file> <start>` is deprecated. Use the -s option')