A collection of recipes to use Lark and its various features
Transformers are the common interface for processing matched rules and tokens.
They can be used during parsing for better performance.
from lark import Lark, Transformer
class T(Transformer):
def INT(self, tok):
"Convert the value of `tok` from string to int, while maintaining line number & column."
return tok.update(value=int(tok))
parser = Lark("""
start: INT*
%import common.INT
%ignore " "
""", parser="lalr", transformer=T())
print(parser.parse('3 14 159'))
Prints out:
Tree(start, [Token(INT, 3), Token(INT, 14), Token(INT, 159)])
lexer_callbacks
can be used to interface with the lexer as it generates tokens.
It accepts a dictionary of the form
{TOKEN_TYPE: callback}
Where callback is of type f(Token) -> Token
It only works with the standard and contextual lexers.
This has the same effect of using a transformer, but can also process ignored tokens.
from lark import Lark
comments = []
parser = Lark("""
start: INT*
COMMENT: /#.*/
%import common (INT, WS)
%ignore COMMENT
%ignore WS
""", parser="lalr", lexer_callbacks={'COMMENT': comments.append})
parser.parse("""
1 2 3 # hello
# world
4 5 6
""")
print(comments)
Prints out:
[Token(COMMENT, '# hello'), Token(COMMENT, '# world')]
Note: We don’t have to return a token, because comments are ignored
Parsing ambiguous texts with earley and ambiguity='explicit'
produces a single tree with _ambig
nodes to mark where the ambiguity occured.
However, it’s sometimes more convenient instead to work with a list of all possible unambiguous trees.
Lark provides a utility transformer for that purpose:
from lark import Lark, Tree, Transformer
from lark.visitors import CollapseAmbiguities
grammar = """
!start: x y
!x: "a" "b"
| "ab"
| "abc"
!y: "c" "d"
| "cd"
| "d"
"""
parser = Lark(grammar, ambiguity='explicit')
t = parser.parse('abcd')
for x in CollapseAmbiguities().transform(t):
print(x.pretty())
This prints out:
start
x
a
b
y
c
d
start
x ab
y cd
start
x abc
y d
While convenient, this should be used carefully, as highly ambiguous trees will soon create an exponential explosion of such unambiguous derivations.
The following visitor assigns a parent
attribute for every node in the tree.
If your tree nodes aren’t unique (if there is a shared Tree instance), the assert will fail.
class Parent(Visitor):
def __default__(self, tree):
for subtree in tree.children:
if isinstance(subtree, Tree):
assert not hasattr(subtree, 'parent')
subtree.parent = tree