- Refactored lexer interface into LexerConf
- Lexer now compiles regexps only when used (especially useful for ContextualLexer)
- Lexer now doesn't validate on deserialize (noticable speedup)
- Makes rule ordering the default ambiguity tie breaker.
E.g.
start: a | b
a: "A"
b: "A"
will return:
start
a
start: b | a
a: "A"
b: "A"
will return
start
b
- Replaces the ambiguity='resolve__antiscore_sum' with a separate option: 'priority'.
The priority option has 4 values: 'auto', 'none', 'normal', 'invert'.
'Auto' maps to 'Normal' for CYK and Earley and 'None' for LALR.
'None' filters your priorities and ignores them. This saves some extra tree walking on Earley.
'Normal' uses your priorities untouched, mimicing the old behaviour.
'Invert' negates your priorities, emulating the old 'resolve__antiscore_sum' behaviour.
This allows you to use priority logic even when ambiguity=='explicit', to get a better idea
of the shape of your tree; and to easily disable priorities without removing them from the
grammar for testing (or performance).
- ambiguity='explicit' now correctly returns an ambiguous tree again, as 0.6 did.
Changed dynamic lexer behavior to only match terminals to their maximum length (i.e. greedy match), emulating the standard lexer.
The original dynamic lexer behavior, that attempts to match all appearances of a terminal, has been moved to the "dynamic_complete" lexer.
For example, when applying a terminal "a"+ to the text "aaa":
- dynamic: ["aaa"]
- dynamic_complete: ["a", "aa", "aaa"]
* All exceptions are now under exceptions.py
* UnexpectedInput is now superclass of UnexpectedToken and UnexpectedCharacters,
all of which support the get_context() and match_examples() methods.
Anonymous tokens would become visible if they had the same value as named tokens.
That's because they are merged for the lexer. But after this change, the rules for
visibility are based on their use in the rule, and not their name or identity.