Parse any context-free grammar, FAST and EASY!
Beginners: Forget everything you knew about parsers. Lark’s algorithm can quickly parse any grammar you throw at it, no matter how complicated. It also constructs a parse-tree for you, without additional code on your part.
Experts: Lark lets you choose between Earley and LALR(1), to trade-off power and speed. It also contains experimental features such as a contextual-lexer.
Lark can:
And many more features. Read ahead and find out.
Here is a little program to parse “Hello, World!” (Or any other similar phrase):
from lark import Lark
l = Lark('''start: WORD "," WORD "!"
%import common.WORD
%ignore " "
''')
print( l.parse("Hello, World!") )
And the output is:
Tree(start, [Token(WORD, 'Hello'), Token(WORD, 'World')])
Notice punctuation doesn’t appear in the resulting tree. It’s automatically filtered away by Lark.
from lark import Lark, InlineTransformer
parser = Lark('''?sum: product
| sum "+" product -> add
| sum "-" product -> sub
?product: item
| product "*" item -> mul
| product "/" item -> div
?item: NUMBER -> number
| "-" item -> neg
| "(" sum ")"
%import common.NUMBER
%import common.WS
%ignore WS
''', start='sum')
class CalculateTree(InlineTransformer):
from operator import add, sub, mul, truediv as div, neg
number = float
def calc(expr):
return CalculateTree().transform( parser.parse(expr) )
In the grammar, we shape the resulting tree. The ‘->’ operator renames branches, and the ‘?’ prefix tells Lark to inline single values. (see the tutorial for a more in-depth explanation)
Then, the transformer calculates the tree and returns a number:
>>> calc("(200 + 3*-3) * 7")
1337.0
Lark can automatically resolve ambiguity by choosing the simplest solution. Or, you can ask it to return all the possible parse trees, wrapped in a meta “_ambig” node.
Here’s a toy example to parse the famously ambiguous phrase: “fruit flies like bananas”
from lark import Lark
grammar = """
sentence: noun verb noun -> simple
| noun verb "like" noun -> comparative
noun: adj? NOUN
verb: VERB
adj: ADJ
NOUN: "flies" | "bananas" | "fruit"
VERB: "like" | "flies"
ADJ: "fruit"
%import common.WS
%ignore WS
"""
parser = Lark(grammar, start='sentence', ambiguity='explicit') # Explicit ambiguity in parse tree!
tree = parser.parse('fruit flies like bananas')
from lark.tree import pydot__tree_to_png # Just a neat utility function
pydot__tree_to_png(tree, "examples/fruitflies.png")
$ pip install lark-parser
Lark has no dependencies.
Using Lark? Send me a message and I’ll add your project!
Lark comes with a tool to convert grammars from Nearley, a popular Earley library for Javascript. It uses Js2Py to convert and run the Javascript postprocessing code segments.
Here’s an example:
git clone https://github.com/Hardmath123/nearley
python -m lark.tools.nearley nearley/examples/calculator/arithmetic.ne main nearley > ncalc.py
You can use the output as a regular python module:
>>> import ncalc
>>> ncalc.parse('sin(pi/4) ^ e')
0.38981434460254655
These features are planned to be implemented in the near future:
These features may be implemented some day:
Separates code from grammar: Parsers written this way are cleaner and easier to read & work with.
Automatically builds a parse tree (AST): Trees are always simpler to work with than state-machines. (But if you want to provide a callback for efficiency reasons, Lark lets you do that too)
Follows Python’s Idioms: Beautiful is better than ugly. Readability counts.
Code | CPython Time | PyPy Time | CPython Mem | PyPy Mem |
---|---|---|---|---|
Lark - LALR(1) | 4.7s | 1.2s | 70M | 134M |
PyParsing | 32s | 3.5s | 443M | 225M |
funcparserlib | 8.5s | 1.3s | 483M | 293M |
Parsimonious | 5.7s | 1545M |
Check out the JSON tutorial for more details on how the comparison was made.
Library | Algorithm | LOC | Grammar | Builds tree? |
---|---|---|---|---|
Lark | Earley/LALR(1) | 0.5K | EBNF+ | Yes! |
PLY | LALR(1) | 4.6K | Yacc-like BNF | No |
PyParsing | PEG | 5.7K | Parser combinators | No |
Parsley | PEG | 3.3K | EBNF-like | No |
funcparserlib | Recursive-Descent | 0.5K | Parser combinators | No |
Parsimonious | PEG | ? | EBNF | Yes |
(LOC measures lines of code of the parsing algorithm(s), without accompanying files)
It’s hard to compare parsers with different parsing algorithms, since each algorithm has many advantages and disadvantages. However, I will try to summarize the main points here:
Lark offers both Earley and LALR(1), which means you can choose between the most powerful and the most efficient algorithms, without having to change libraries.
(* According to Wikipedia, it remains unanswered whether PEGs can really parse all deterministic CFGs)
Lark uses the MIT license.
Lark is currently accepting pull-requests.
There are many ways you can help the project:
If you’re interested in taking one of these on, let me know and I will provide more details and assist you in the process.
If you have any questions or want my assistance, you can email me at erezshin at gmail com.
I’m also available for contract work.