Lark is a general-purpose parsing library. It’s written in Python, and supports two parsing algorithms: Earley (default) and LALR(1).
Lark is a re-write of my previous parsing library, PlyPlus.
Lark accepts its grammars in EBNF form.
The grammar is a list of rules and tokens, each in their own line.
Rules can be defined on multiple lines when using the OR operator ( | ).
Comments start with // and last to the end of the line (C++ style)
Lark begins the parse with the rule ‘start’, unless specified otherwise in the options.
Tokens are defined in terms of:
NAME : "string" or /regexp/
NAME.ignore : ..
.ignore is a flag that drops the token before it reaches the parser (usually whitespace)
Example:
IF: "if"
INTEGER : /[0-9]+/
WHITESPACE.ignore: /[ \t\n]+/
Each rule is defined in terms of:
name : list of items to match
| another list of items -> optional_alias
| etc.
An alias is a name for the specific rule alternative. It affects tree construction.
An item is a:
Example:
float: "-"? DIGIT* "." DIGIT+ exp
| "-"? DIGIT+ exp
exp: "-"? ("e" | "E") DIGIT+
DIGIT: /[0-9]/
Lark builds a tree automatically based on the structure of the grammar. Is also accepts some hints.
In general, Lark will place each rule as a branch, and its matches as the children of the branch.
Using item+ or item* will result in a list of items.
Example:
expr: "(" expr ")"
| NAME+
NAME: /\w+/
Lark will parse “(((hello world)))” as:
expr
expr
expr
"hello"
"world"
The brackets do not appear in the tree by design.
Tokens that won’t appear in the tree are:
Tokens that will appear in the tree are:
a. Rules whose name begins with an underscore will be inlined into their containing rule.
Example:
start: "(" _greet ")"
_greet: /\w+/ /\w+/
Lark will parse “(hello world)” as:
start
"hello"
"world"
b. Rules that recieve a question mark (?) at the beginning of their definition, will be inlined if they have a single child.
Example:
start: greet greet
?greet: "(" /\w+/ ")"
| /\w+ /\w+/
Lark will parse “hello world (planet)” as:
start
greet
"hello"
"world"
"planet"
c. Aliases - options in a rule can receive an alias. It will be then used as the branch name for the option.
Example:
start: greet greet
greet: "hello" -> hello
| "world"
Lark will parse “hello world” as:
start
hello
greet
When initializing the Lark object, you can provide it with keyword options:
To be supported: