diff --git a/README.md b/README.md index 1c7062c..02b89d7 100644 --- a/README.md +++ b/README.md @@ -176,6 +176,27 @@ You can use the output as a regular python module: 0.38981434460254655 ``` +### Using Unicode character classes with `regex` +Python's builtin `re` module has a few persistent known bugs and also won't parse +advanced regex features such as character classes. +With `pip install lark-parser[regex]`, the `regex` module will be installed alongside `lark` +and can act as a drop-in replacement to `re`. + +Any instance of `Lark` instantiated with `regex=True` will now use the `regex` module +instead of `re`. For example, we can now use character classes to match PEP-3131 compliant Python identifiers. +```python +from lark import Lark +>>> g = Lark(r""" + ?start: NAME + NAME: ID_START ID_CONTINUE* + ID_START: /[\p{Lu}\p{Ll}\p{Lt}\p{Lm}\p{Lo}\p{Nl}_]+/ + ID_CONTINUE: ID_START | /[\p{Mn}\p{Mc}\p{Nd}\p{Pc}·]+/ + """, regex=True) + +>>> g.parse('வணக்கம்') +'வணக்கம்' + +``` ## License