@@ -1,4 +1,4 @@ | |||||
# Features | |||||
# Main Features | |||||
- EBNF-inspired grammar, with extra features (See: [Grammar Reference](grammar.md)) | - EBNF-inspired grammar, with extra features (See: [Grammar Reference](grammar.md)) | ||||
- Builds a parse-tree (AST) automagically based on the grammar | - Builds a parse-tree (AST) automagically based on the grammar | ||||
@@ -47,8 +47,9 @@ A [CYK parser](https://www.wikiwand.com/en/CYK_algorithm) can parse any context- | |||||
Its too slow to be practical for simple grammars, but it offers good performance for highly ambiguous grammars. | Its too slow to be practical for simple grammars, but it offers good performance for highly ambiguous grammars. | ||||
# Other features | |||||
# Extra features | |||||
- Import rules and tokens from other Lark grammars, for code reuse and modularity. | |||||
- Import grammars from Nearley.js | - Import grammars from Nearley.js | ||||
### Experimental features | ### Experimental features | ||||
@@ -59,4 +60,3 @@ Its too slow to be practical for simple grammars, but it offers good performance | |||||
- Grammar composition | - Grammar composition | ||||
- LALR(k) parser | - LALR(k) parser | ||||
- Full regexp-collision support using NFAs | - Full regexp-collision support using NFAs | ||||
- Automatically produce syntax-highlighters for grammars, for popular IDEs |
@@ -109,6 +109,10 @@ four_words: word ~ 4 | |||||
All occurrences of the terminal will be ignored, and won't be part of the parse. | All occurrences of the terminal will be ignored, and won't be part of the parse. | ||||
Using the `%ignore` directive results in a cleaner grammar. | |||||
It's especially important for the LALR(1) algorithm, because adding whitespace (or comments, or other extranous elements) explicitly in the grammar, harms its predictive abilities, which are based on a lookahead of 1. | |||||
**Syntax:** | **Syntax:** | ||||
```html | ```html | ||||
%ignore <TERMINAL> | %ignore <TERMINAL> | ||||
@@ -122,9 +126,9 @@ COMMENT: "#" /[^\n]/* | |||||
``` | ``` | ||||
### %import | ### %import | ||||
Allows to import terminals from lark grammars. | |||||
Allows to import terminals and rules from lark grammars. | |||||
Future versions will allow to import rules and macros. | |||||
When importing rules, all their dependencies will be imported into a namespace, to avoid collisions. It's not possible to override their dependencies (e.g. like you would when inheriting a class). | |||||
**Syntax:** | **Syntax:** | ||||
```html | ```html | ||||
@@ -45,7 +45,7 @@ And anyway, every parse-tree can be replayed as a state-machine, so there is no | |||||
See this answer in more detail [here](https://github.com/erezsh/lark/issues/4). | See this answer in more detail [here](https://github.com/erezsh/lark/issues/4). | ||||
You can skip the building the tree for LALR(1), by providing Lark with a transformer (see the [JSON example](https://github.com/erezsh/lark/blob/master/examples/json_parser.py)). | |||||
To improve performance, you can skip building the tree for LALR(1), by providing Lark with a transformer (see the [JSON example](https://github.com/erezsh/lark/blob/master/examples/json_parser.py)). | |||||
### 3. Earley is the default | ### 3. Earley is the default | ||||
@@ -22,6 +22,8 @@ It only works with the standard and contextual lexers. | |||||
from lark import Lark, Token | from lark import Lark, Token | ||||
def tok_to_int(tok): | def tok_to_int(tok): | ||||
"Convert the value of `tok` from string to int, while maintaining line number & column." | |||||
# tok.type == 'INT' | |||||
return Token.new_borrow_pos(tok.type, int(tok), tok) | return Token.new_borrow_pos(tok.type, int(tok), tok) | ||||
parser = Lark(""" | parser = Lark(""" | ||||