Lark builds a tree automatically based on the structure of the grammar, where each rule that is matched becomes a branch (node) in the tree, and its children are its matches, in the order of matching.
For example, the rule node: child1 child2
will create a tree node with two children. If it is matched as part of another rule (i.e. if it isn’t the root), the new rule’s tree node will become its parent.
Using item+
or item*
will result in a list of items, equivalent to writing item item item ..
.
Using item?
will return the item if it matched, or nothing.
If maybe_placeholders=False
(the default), then []
behaves like ()?
.
If maybe_placeholders=True
, then using [item]
will return the item if it matched, or the value None
, if it didn’t.
Terminals are always values in the tree, never branches.
Lark filters out certain types of terminals by default, considering them punctuation:
Terminals that won’t appear in the tree are:
"keyword"
or "+"
)_DIGIT
)Terminals that will appear in the tree are:
/[0-9]/
)DIGIT
)Note: Terminals composed of literals and other terminals always include the entire match without filtering any part.
Example:
start: PNAME pname
PNAME: "(" NAME ")"
pname: "(" NAME ")"
NAME: /\w+/
%ignore /\s+/
Lark will parse “(Hello) (World)” as:
start
(Hello)
pname World
Rules prefixed with !
will retain all their literals regardless.
Example:
expr: "(" expr ")"
| NAME+
NAME: /\w+/
%ignore " "
Lark will parse “((hello world))” as:
expr
expr
expr
"hello"
"world"
The brackets do not appear in the tree by design. The words appear because they are matched by a named terminal.
Users can alter the automatic construction of the tree using a collection of grammar features.
Example:
start: "(" _greet ")"
_greet: /\w+/ /\w+/
Lark will parse “(hello world)” as:
start
"hello"
"world"
Example:
start: greet greet
?greet: "(" /\w+/ ")"
| /\w+/ /\w+/
Lark will parse “hello world (planet)” as:
start
greet
"hello"
"world"
"planet"
!expr: "(" expr ")"
| NAME+
NAME: /\w+/
%ignore " "
Will parse “((hello world))” as:
expr
(
expr
(
expr
hello
world
)
)
Using the !
prefix is usually a “code smell”, and may point to a flaw in your grammar design.
Example:
start: greet greet
greet: "hello"
| "world" -> planet
Lark will parse “hello world” as:
start
greet
planet