From 608395cd4d027d6c7b0e72757a3e97f271450b4c Mon Sep 17 00:00:00 2001 From: Erez Date: Sat, 12 Jan 2019 14:44:57 +0200 Subject: [PATCH] Improved docs --- docs/features.md | 6 +++--- docs/grammar.md | 8 ++++++-- docs/philosophy.md | 2 +- docs/recipes.md | 2 ++ 4 files changed, 12 insertions(+), 6 deletions(-) diff --git a/docs/features.md b/docs/features.md index e9f9109..33eb544 100644 --- a/docs/features.md +++ b/docs/features.md @@ -1,4 +1,4 @@ -# Features +# Main Features - EBNF-inspired grammar, with extra features (See: [Grammar Reference](grammar.md)) - Builds a parse-tree (AST) automagically based on the grammar @@ -47,8 +47,9 @@ A [CYK parser](https://www.wikiwand.com/en/CYK_algorithm) can parse any context- Its too slow to be practical for simple grammars, but it offers good performance for highly ambiguous grammars. -# Other features +# Extra features + - Import rules and tokens from other Lark grammars, for code reuse and modularity. - Import grammars from Nearley.js ### Experimental features @@ -59,4 +60,3 @@ Its too slow to be practical for simple grammars, but it offers good performance - Grammar composition - LALR(k) parser - Full regexp-collision support using NFAs - - Automatically produce syntax-highlighters for grammars, for popular IDEs diff --git a/docs/grammar.md b/docs/grammar.md index 87912a5..466b349 100644 --- a/docs/grammar.md +++ b/docs/grammar.md @@ -109,6 +109,10 @@ four_words: word ~ 4 All occurrences of the terminal will be ignored, and won't be part of the parse. +Using the `%ignore` directive results in a cleaner grammar. + +It's especially important for the LALR(1) algorithm, because adding whitespace (or comments, or other extranous elements) explicitly in the grammar, harms its predictive abilities, which are based on a lookahead of 1. + **Syntax:** ```html %ignore @@ -122,9 +126,9 @@ COMMENT: "#" /[^\n]/* ``` ### %import -Allows to import terminals from lark grammars. +Allows to import terminals and rules from lark grammars. -Future versions will allow to import rules and macros. +When importing rules, all their dependencies will be imported into a namespace, to avoid collisions. It's not possible to override their dependencies (e.g. like you would when inheriting a class). **Syntax:** ```html diff --git a/docs/philosophy.md b/docs/philosophy.md index 9d77ee0..270f95d 100644 --- a/docs/philosophy.md +++ b/docs/philosophy.md @@ -45,7 +45,7 @@ And anyway, every parse-tree can be replayed as a state-machine, so there is no See this answer in more detail [here](https://github.com/erezsh/lark/issues/4). -You can skip the building the tree for LALR(1), by providing Lark with a transformer (see the [JSON example](https://github.com/erezsh/lark/blob/master/examples/json_parser.py)). +To improve performance, you can skip building the tree for LALR(1), by providing Lark with a transformer (see the [JSON example](https://github.com/erezsh/lark/blob/master/examples/json_parser.py)). ### 3. Earley is the default diff --git a/docs/recipes.md b/docs/recipes.md index 6c36564..2202ab7 100644 --- a/docs/recipes.md +++ b/docs/recipes.md @@ -22,6 +22,8 @@ It only works with the standard and contextual lexers. from lark import Lark, Token def tok_to_int(tok): + "Convert the value of `tok` from string to int, while maintaining line number & column." + # tok.type == 'INT' return Token.new_borrow_pos(tok.type, int(tok), tok) parser = Lark("""