| @@ -1,4 +1,4 @@ | |||||
| # Features | |||||
| # Main Features | |||||
| - EBNF-inspired grammar, with extra features (See: [Grammar Reference](grammar.md)) | - EBNF-inspired grammar, with extra features (See: [Grammar Reference](grammar.md)) | ||||
| - Builds a parse-tree (AST) automagically based on the grammar | - Builds a parse-tree (AST) automagically based on the grammar | ||||
| @@ -47,8 +47,9 @@ A [CYK parser](https://www.wikiwand.com/en/CYK_algorithm) can parse any context- | |||||
| Its too slow to be practical for simple grammars, but it offers good performance for highly ambiguous grammars. | Its too slow to be practical for simple grammars, but it offers good performance for highly ambiguous grammars. | ||||
| # Other features | |||||
| # Extra features | |||||
| - Import rules and tokens from other Lark grammars, for code reuse and modularity. | |||||
| - Import grammars from Nearley.js | - Import grammars from Nearley.js | ||||
| ### Experimental features | ### Experimental features | ||||
| @@ -59,4 +60,3 @@ Its too slow to be practical for simple grammars, but it offers good performance | |||||
| - Grammar composition | - Grammar composition | ||||
| - LALR(k) parser | - LALR(k) parser | ||||
| - Full regexp-collision support using NFAs | - Full regexp-collision support using NFAs | ||||
| - Automatically produce syntax-highlighters for grammars, for popular IDEs | |||||
| @@ -109,6 +109,10 @@ four_words: word ~ 4 | |||||
| All occurrences of the terminal will be ignored, and won't be part of the parse. | All occurrences of the terminal will be ignored, and won't be part of the parse. | ||||
| Using the `%ignore` directive results in a cleaner grammar. | |||||
| It's especially important for the LALR(1) algorithm, because adding whitespace (or comments, or other extranous elements) explicitly in the grammar, harms its predictive abilities, which are based on a lookahead of 1. | |||||
| **Syntax:** | **Syntax:** | ||||
| ```html | ```html | ||||
| %ignore <TERMINAL> | %ignore <TERMINAL> | ||||
| @@ -122,9 +126,9 @@ COMMENT: "#" /[^\n]/* | |||||
| ``` | ``` | ||||
| ### %import | ### %import | ||||
| Allows to import terminals from lark grammars. | |||||
| Allows to import terminals and rules from lark grammars. | |||||
| Future versions will allow to import rules and macros. | |||||
| When importing rules, all their dependencies will be imported into a namespace, to avoid collisions. It's not possible to override their dependencies (e.g. like you would when inheriting a class). | |||||
| **Syntax:** | **Syntax:** | ||||
| ```html | ```html | ||||
| @@ -45,7 +45,7 @@ And anyway, every parse-tree can be replayed as a state-machine, so there is no | |||||
| See this answer in more detail [here](https://github.com/erezsh/lark/issues/4). | See this answer in more detail [here](https://github.com/erezsh/lark/issues/4). | ||||
| You can skip the building the tree for LALR(1), by providing Lark with a transformer (see the [JSON example](https://github.com/erezsh/lark/blob/master/examples/json_parser.py)). | |||||
| To improve performance, you can skip building the tree for LALR(1), by providing Lark with a transformer (see the [JSON example](https://github.com/erezsh/lark/blob/master/examples/json_parser.py)). | |||||
| ### 3. Earley is the default | ### 3. Earley is the default | ||||
| @@ -22,6 +22,8 @@ It only works with the standard and contextual lexers. | |||||
| from lark import Lark, Token | from lark import Lark, Token | ||||
| def tok_to_int(tok): | def tok_to_int(tok): | ||||
| "Convert the value of `tok` from string to int, while maintaining line number & column." | |||||
| # tok.type == 'INT' | |||||
| return Token.new_borrow_pos(tok.type, int(tok), tok) | return Token.new_borrow_pos(tok.type, int(tok), tok) | ||||
| parser = Lark(""" | parser = Lark(""" | ||||