|
|
@@ -176,6 +176,27 @@ You can use the output as a regular python module: |
|
|
|
0.38981434460254655 |
|
|
|
``` |
|
|
|
|
|
|
|
### Using Unicode character classes with `regex` |
|
|
|
Python's builtin `re` module has a few persistent known bugs and also won't parse |
|
|
|
advanced regex features such as character classes. |
|
|
|
With `pip install lark-parser[regex]`, the `regex` module will be installed alongside `lark` |
|
|
|
and can act as a drop-in replacement to `re`. |
|
|
|
|
|
|
|
Any instance of `Lark` instantiated with `regex=True` will now use the `regex` module |
|
|
|
instead of `re`. For example, we can now use character classes to match PEP-3131 compliant Python identifiers. |
|
|
|
```python |
|
|
|
from lark import Lark |
|
|
|
>>> g = Lark(r""" |
|
|
|
?start: NAME |
|
|
|
NAME: ID_START ID_CONTINUE* |
|
|
|
ID_START: /[\p{Lu}\p{Ll}\p{Lt}\p{Lm}\p{Lo}\p{Nl}_]+/ |
|
|
|
ID_CONTINUE: ID_START | /[\p{Mn}\p{Mc}\p{Nd}\p{Pc}·]+/ |
|
|
|
""", regex=True) |
|
|
|
|
|
|
|
>>> g.parse('வணக்கம்') |
|
|
|
'வணக்கம்' |
|
|
|
|
|
|
|
``` |
|
|
|
|
|
|
|
## License |
|
|
|
|
|
|
|