From 535aebab3c770d5b3acbe6fa21394c901a1f2345 Mon Sep 17 00:00:00 2001
From: Erez Shinan <erezshin+git@gmail.com>
Date: Wed, 11 Sep 2019 01:05:15 +0300
Subject: [PATCH] Added to docs (Issue #400)

---
 docs/grammar.md | 57 +++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 57 insertions(+)
diff --git a/docs/grammar.md b/docs/grammar.md
index 9343ee4..228c8b7 100644
--- a/docs/grammar.md
+++ b/docs/grammar.md
@@ -1,5 +1,13 @@
 # Grammar Reference
 
+Table of contents:
+
+1. [Definitions](#defs)
+1. [Terminals](#terms)
+1. [Rules](#rules)
+1. [Directives](#dirs)
+
+<a name="defs"></a>
 ## Definitions
 
 **A grammar** is a list of rules and terminals, that together define a language.
@@ -25,6 +33,7 @@ Lark begins the parse with the rule 'start', unless specified otherwise in the o
 Names of rules are always in lowercase, while names of terminals are always in uppercase. This distinction has practical effects, for the shape of the generated parse-tree, and the automatic construction of the lexer (aka tokenizer, or scanner).
 
 
+<a name="terms"></a>
 ## Terminals
 
 Terminals are used to match text into symbols. They can be defined as a combination of literals and other terminals.
@@ -70,6 +79,53 @@ WHITESPACE: (" " | /\t/ )+
 SQL_SELECT: "select"i
 ```
 
+### Regular expressions & Ambiguity
+
+Each terminal is eventually compiled to a regular expression. All the operators and references inside it are mapped to their respective expressions.
+
+For example, in the following grammar, `A1` and `A2`, are equivalent:
+```perl
+A1: "a" | "b"
+A2: /a|b/
+```
+
+This means that inside terminals, Lark cannot detect or resolve ambiguity, even when using Earley.
+
+For example, for this grammar:
+```perl
+start           : (A | B)+
+A               : "a" | "ab"
+B               : "b"
+```
+We get this behavior:
+
+```bash
+>>> p.parse("ab")
+Tree(start, [Token(A, 'a'), Token(B, 'b')])
+```
+
+This is happening because Python's regex engine always returns the first matching option.
+
+If you find yourself in this situation, the recommended solution is to use rules instead.
+
+Example:
+
+```python
+>>> p = Lark("""start: (a | b)+
+...             !a: "a" | "ab"
+...             !b: "b"
+...             """, ambiguity="explicit")
+>>> print(p.parse("ab").pretty())
+_ambig
+  start
+    a   ab
+  start
+    a   a
+    b   b
+```
+
+
+<a name="rules"></a>
 ## Rules
 
 **Syntax:**
@@ -114,6 +170,7 @@ Rules can be assigned priority only when using Earley (future versions may suppo
 
 Priority can be either positive or negative. In not specified for a terminal, it's assumed to be 1 (i.e. the default).
 
+<a name="dirs"></a>
 ## Directives
 
 ### %ignore