diff --git a/docs/classes.md b/docs/classes.md
index 9943fd4..1555a1f 100644
--- a/docs/classes.md
+++ b/docs/classes.md
@@ -1,15 +1,13 @@
-# Classes - Reference
+# Classes Reference
 
 This page details the important classes in Lark.
 
 ----
 
-## Lark
+## lark.Lark
 
 The Lark class is the main interface for the library. It's mostly a thin wrapper for the many different parsers, and for the tree constructor.
 
-### Methods
-
 #### \_\_init\_\_(self, grammar, **options)
 
 The Lark class accepts a grammar string or file object, and keyword options:
@@ -50,14 +48,10 @@ If a transformer is supplied to `__init__`, returns whatever is the result of th
 
 The main tree class
 
-### Properties
-
 * `data` - The name of the rule or alias
 * `children` - List of matched sub-rules and terminals
 * `meta` - Line & Column numbers, if using `propagate_positions`
 
-### Methods
-
 #### \_\_init\_\_(self, data, children)
 
 Creates a new tree, and stores "data" and "children" in attributes of the same name.
@@ -92,102 +86,6 @@ Trees can be hashed and compared.
 
 ----
 
-## Transformers & Visitors
-
-Transformers & Visitors provide a convenient interface to process the parse-trees that Lark returns.
-
-They are used by inheriting from the correct class (visitor or transformer), and implementing methods corresponding to the rule you wish to process. Each methods accepts the children as an argument. That can be modified using the `v-args` decorator, which allows to inline the arguments (akin to `*args`), or add the tree `meta` property as an argument.
-
-See: https://github.com/lark-parser/lark/blob/master/lark/visitors.py
-
-### Visitors
-
-Visitors visit each node of the tree, and run the appropriate method on it according to the node's data.
-
-They work bottom-up, starting with the leaves and ending at the root of the tree.
-
-**Example**
-```python
-class IncreaseAllNumbers(Visitor):
-  def number(self, tree):
-    assert tree.data == "number"
-    tree.children[0] += 1
-
-IncreaseAllNumbers().visit(parse_tree)
-```
-
-There are two classes that implement the visitor interface:
-
-* Visitor - Visit every node (without recursion)
-
-* Visitor_Recursive - Visit every node using recursion. Slightly faster.
-
-### Transformers
-
-Transformers visit each node of the tree, and run the appropriate method on it according to the node's data.
-
-They work bottom-up (or: depth-first), starting with the leaves and ending at the root of the tree.
-
-Transformers can be used to implement map & reduce patterns.
-
-Because nodes are reduced from leaf to root, at any point the callbacks may assume the children have already been transformed (if applicable).
-
-Transformers can be chained into a new transformer by using multiplication.
-
-**Example:**
-```python
-from lark import Tree, Transformer
-
-class EvalExpressions(Transformer):
-    def expr(self, args):
-            return eval(args[0])
-
-t = Tree('a', [Tree('expr', ['1+2'])])
-print(EvalExpressions().transform( t ))
-
-# Prints: Tree(a, [3])
-```
-
-
-Here are the classes that implement the transformer interface:
-
-- Transformer - Recursively transforms the tree. This is the one you probably want.
-- Transformer_InPlace - Non-recursive. Changes the tree in-place instead of returning new instances
-- Transformer_InPlaceRecursive - Recursive. Changes the tree in-place instead of returning new instances
-
-### v_args
-
-`v_args` is a decorator.
-
-By default, callback methods of transformers/visitors accept one argument: a list of the node's children. `v_args` can modify this behavior.
-
-When used on a transformer/visitor class definition, it applies to all the callback methods inside it.
-
-`v_args` accepts one of three flags:
-
-- `inline` - Children are provided as `*args` instead of a list argument (not recommended for very long lists).
-- `meta` - Provides two arguments: `children` and `meta` (instead of just the first)
-- `tree` - Provides the entire tree as the argument, instead of the children.
-
-Examples:
-
-```python
-@v_args(inline=True)
-class SolveArith(Transformer):
-    def add(self, left, right):
-        return left + right
-
-
-class ReverseNotation(Transformer_InPlace):
-    @v_args(tree=True):
-    def tree_node(self, tree):
-        tree.children = tree.children[::-1]
-```
-
-### Discard
-
-When raising the `Discard` exception in a transformer callback, that node is discarded and won't appear in the parent.
-
 ## Token
 
 When using a lexer, the resulting tokens in the trees will be of the Token class, which inherits from Python's string. So, normal string comparisons and operations will work as expected. Tokens also have other useful attributes:
@@ -199,17 +97,25 @@ When using a lexer, the resulting tokens in the trees will be of the Token class
 * `end_line` - The line where the token ends
 * `end_column` - The next column after the end of the token. For example, if the token is a single character with a `column` value of 4, `end_column` will be 5.
 
+## Transformer
+## Visitor
+## Interpreter
+
+See the [visitors page](visitors.md)
+
 
 ## UnexpectedInput
 
+## UnexpectedToken
+
+## UnexpectedException
+
 - `UnexpectedInput`
     - `UnexpectedToken` - The parser recieved an unexpected token
     - `UnexpectedCharacters` - The lexer encountered an unexpected string
 
 After catching one of these exceptions, you may call the following helper methods to create a nicer error message:
 
-### Methods
-
 #### get_context(text, span)
 
 Returns a pretty string pinpointing the error in the text, with `span` amount of context characters around it.
diff --git a/docs/grammar.md b/docs/grammar.md
index 9343ee4..8a8913b 100644
--- a/docs/grammar.md
+++ b/docs/grammar.md
@@ -1,5 +1,13 @@
 # Grammar Reference
 
+Table of contents:
+
+1. [Definitions](#defs)
+1. [Terminals](#terms)
+1. [Rules](#rules)
+1. [Directives](#dirs)
+
+<a name="defs"></a>
 ## Definitions
 
 **A grammar** is a list of rules and terminals, that together define a language.
@@ -25,6 +33,7 @@ Lark begins the parse with the rule 'start', unless specified otherwise in the o
 Names of rules are always in lowercase, while names of terminals are always in uppercase. This distinction has practical effects, for the shape of the generated parse-tree, and the automatic construction of the lexer (aka tokenizer, or scanner).
 
 
+<a name="terms"></a>
 ## Terminals
 
 Terminals are used to match text into symbols. They can be defined as a combination of literals and other terminals.
@@ -70,6 +79,53 @@ WHITESPACE: (" " | /\t/ )+
 SQL_SELECT: "select"i
 ```
 
+### Regular expressions & Ambiguity
+
+Each terminal is eventually compiled to a regular expression. All the operators and references inside it are mapped to their respective expressions.
+
+For example, in the following grammar, `A1` and `A2`, are equivalent:
+```perl
+A1: "a" | "b"
+A2: /a|b/
+```
+
+This means that inside terminals, Lark cannot detect or resolve ambiguity, even when using Earley.
+
+For example, for this grammar:
+```perl
+start           : (A | B)+
+A               : "a" | "ab"
+B               : "b"
+```
+We get this behavior:
+
+```bash
+>>> p.parse("ab")
+Tree(start, [Token(A, 'a'), Token(B, 'b')])
+```
+
+This is happening because Python's regex engine always returns the first matching option.
+
+If you find yourself in this situation, the recommended solution is to use rules instead.
+
+Example:
+
+```python
+>>> p = Lark("""start: (a | b)+
+...             !a: "a" | "ab"
+...             !b: "b"
+...             """, ambiguity="explicit")
+>>> print(p.parse("ab").pretty())
+_ambig
+  start
+    a   ab
+  start
+    a   a
+    b   b
+```
+
+
+<a name="rules"></a>
 ## Rules
 
 **Syntax:**
@@ -114,6 +170,7 @@ Rules can be assigned priority only when using Earley (future versions may suppo
 
 Priority can be either positive or negative. In not specified for a terminal, it's assumed to be 1 (i.e. the default).
 
+<a name="dirs"></a>
 ## Directives
 
 ### %ignore
@@ -122,7 +179,7 @@ All occurrences of the terminal will be ignored, and won't be part of the parse.
 
 Using the `%ignore` directive results in a cleaner grammar.
 
-It's especially important for the LALR(1) algorithm, because adding whitespace (or comments, or other extranous elements) explicitly in the grammar, harms its predictive abilities, which are based on a lookahead of 1.
+It's especially important for the LALR(1) algorithm, because adding whitespace (or comments, or other extraneous elements) explicitly in the grammar, harms its predictive abilities, which are based on a lookahead of 1.
 
 **Syntax:**
 ```html
diff --git a/docs/how_to_develop.md b/docs/how_to_develop.md
index d69a1e3..b161e0c 100644
--- a/docs/how_to_develop.md
+++ b/docs/how_to_develop.md
@@ -7,7 +7,7 @@ There are many ways you can help the project:
 * Write new grammars for Lark's library
 * Write a blog post introducing Lark to your audience
 * Port Lark to another language
-* Help me with code developemnt
+* Help me with code development
 
 If you're interested in taking one of these on, let me know and I will provide more details and assist you in the process.
 
@@ -60,4 +60,4 @@ Another way to run the tests is using setup.py:
 
 ```bash
 python setup.py test 
-```
\ No newline at end of file
+```
diff --git a/docs/index.md b/docs/index.md
index 8517208..d693cce 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -35,8 +35,8 @@ $ pip install lark-parser
 * [Examples](https://github.com/lark-parser/lark/tree/master/examples)
 * Tutorials
     * [How to write a DSL](http://blog.erezsh.com/how-to-write-a-dsl-in-python-with-lark/) - Implements a toy LOGO-like language with an interpreter
-    * [How to write a JSON parser](json_tutorial.md)
-    * External
+    * [How to write a JSON parser](json_tutorial.md) - Teaches you how to use Lark
+    * Unofficial
         * [Program Synthesis is Possible](https://www.cs.cornell.edu/~asampson/blog/minisynth.html) - Creates a DSL for Z3
 * Guides
     * [How to use Lark](how_to_use.md)
@@ -44,6 +44,7 @@ $ pip install lark-parser
 * Reference
     * [Grammar](grammar.md)
     * [Tree Construction](tree_construction.md)
+    * [Visitors & Transformers](visitors.md)
     * [Classes](classes.md)
     * [Cheatsheet (PDF)](lark_cheatsheet.pdf)
 * Discussion
diff --git a/docs/json_tutorial.md b/docs/json_tutorial.md
index ca1db73..9cc87e7 100644
--- a/docs/json_tutorial.md
+++ b/docs/json_tutorial.md
@@ -230,7 +230,8 @@ from lark import Transformer
 class MyTransformer(Transformer):
     def list(self, items):
         return list(items)
-    def pair(self, (k,v)):
+    def pair(self, key_value):
+        k, v = key_value
         return k, v
     def dict(self, items):
         return dict(items)
@@ -251,9 +252,11 @@ Also, our definitions of list and dict are a bit verbose. We can do better:
 from lark import Transformer
 
 class TreeToJson(Transformer):
-    def string(self, (s,)):
+    def string(self, s):
+        (s,) = s
         return s[1:-1]
-    def number(self, (n,)):
+    def number(self, n):
+        (n,) = n
         return float(n)
 
     list = list
@@ -315,9 +318,11 @@ json_grammar = r"""
     """
 
 class TreeToJson(Transformer):
-    def string(self, (s,)):
+    def string(self, s):
+        (s,) = s
         return s[1:-1]
-    def number(self, (n,)):
+    def number(self, n):
+        (n,) = n
         return float(n)
 
     list = list
diff --git a/docs/parsers.md b/docs/parsers.md
index fb7c997..c487238 100644
--- a/docs/parsers.md
+++ b/docs/parsers.md
@@ -5,9 +5,9 @@ Lark implements the following parsing algorithms: Earley, LALR(1), and CYK
 
 An [Earley Parser](https://www.wikiwand.com/en/Earley_parser) is a chart parser capable of parsing any context-free grammar at O(n^3), and O(n^2) when the grammar is unambiguous. It can parse most LR grammars at O(n). Most programming languages are LR, and can be parsed at a linear time.
 
-Lark's Earley implementation runs on top of a skipping chart parser, which allows it to use regular expressions, instead of matching characters one-by-one. This is a huge improvement to Earley that is unique to Lark. This feature is used by default, but can also be requested explicitely using `lexer='dynamic'`.
+Lark's Earley implementation runs on top of a skipping chart parser, which allows it to use regular expressions, instead of matching characters one-by-one. This is a huge improvement to Earley that is unique to Lark. This feature is used by default, but can also be requested explicitly using `lexer='dynamic'`.
 
-It's possible to bypass the dynamic lexing, and use the regular Earley parser with a traditional lexer, that tokenizes as an independant first step. Doing so will provide a speed benefit, but will tokenize without using Earley's ambiguity-resolution ability. So choose this only if you know why! Activate with `lexer='standard'`
+It's possible to bypass the dynamic lexing, and use the regular Earley parser with a traditional lexer, that tokenizes as an independent first step. Doing so will provide a speed benefit, but will tokenize without using Earley's ambiguity-resolution ability. So choose this only if you know why! Activate with `lexer='standard'`
 
 **SPPF & Ambiguity resolution**
 
@@ -21,7 +21,7 @@ Lark provides the following options to combat ambiguity:
 
 1) Lark will choose the best derivation for you (default). Users can choose between different disambiguation strategies, and can prioritize (or demote) individual rules over others, using the rule-priority syntax.
 
-2) Users may choose to recieve the set of all possible parse-trees (using ambiguity='explicit'), and choose the best derivation themselves. While simple and flexible, it comes at the cost of space and performance, and so it isn't recommended for highly ambiguous grammars, or very long inputs.
+2) Users may choose to receive the set of all possible parse-trees (using ambiguity='explicit'), and choose the best derivation themselves. While simple and flexible, it comes at the cost of space and performance, and so it isn't recommended for highly ambiguous grammars, or very long inputs.
 
 3) As an advanced feature, users may use specialized visitors to iterate the SPPF themselves. Future versions of Lark intend to improve and simplify this interface.
 
diff --git a/docs/visitors.md b/docs/visitors.md
new file mode 100644
index 0000000..c60c1dc
--- /dev/null
+++ b/docs/visitors.md
@@ -0,0 +1,117 @@
+## Transformers & Visitors
+
+Transformers & Visitors provide a convenient interface to process the parse-trees that Lark returns.
+
+They are used by inheriting from the correct class (visitor or transformer), and implementing methods corresponding to the rule you wish to process. Each method accepts the children as an argument. That can be modified using the `v_args` decorator, which allows to inline the arguments (akin to `*args`), or add the tree `meta` property as an argument.
+
+See: <a href="https://github.com/lark-parser/lark/blob/master/lark/visitors.py">visitors.py</a>
+
+### Visitors
+
+Visitors visit each node of the tree, and run the appropriate method on it according to the node's data.
+
+They work bottom-up, starting with the leaves and ending at the root of the tree.
+
+**Example**
+```python
+class IncreaseAllNumbers(Visitor):
+  def number(self, tree):
+    assert tree.data == "number"
+    tree.children[0] += 1
+
+IncreaseAllNumbers().visit(parse_tree)
+```
+
+There are two classes that implement the visitor interface:
+
+* Visitor - Visit every node (without recursion)
+
+* Visitor_Recursive - Visit every node using recursion. Slightly faster.
+
+### Transformers
+
+Transformers visit each node of the tree, and run the appropriate method on it according to the node's data.
+
+They work bottom-up (or: depth-first), starting with the leaves and ending at the root of the tree.
+
+Transformers can be used to implement map & reduce patterns.
+
+Because nodes are reduced from leaf to root, at any point the callbacks may assume the children have already been transformed (if applicable).
+
+Transformers can be chained into a new transformer by using multiplication.
+
+`Transformer` can do anything `Visitor` can do, but because it reconstructs the tree, it is slightly less efficient.
+
+
+**Example:**
+```python
+from lark import Tree, Transformer
+
+class EvalExpressions(Transformer):
+    def expr(self, args):
+            return eval(args[0])
+
+t = Tree('a', [Tree('expr', ['1+2'])])
+print(EvalExpressions().transform( t ))
+
+# Prints: Tree(a, [3])
+```
+
+All these classes implement the transformer interface:
+
+- Transformer - Recursively transforms the tree. This is the one you probably want.
+- Transformer_InPlace - Non-recursive. Changes the tree in-place instead of returning new instances
+- Transformer_InPlaceRecursive - Recursive. Changes the tree in-place instead of returning new instances
+
+### visit_tokens
+
+By default, transformers only visit rules. `visit_tokens=True` will tell Transformer to visit tokens as well. This is a slightly slower alternative to `lexer_callbacks`, but it's easier to maintain and works for all algorithms (even when there isn't a lexer).
+
+Example:
+
+```python
+class T(Transformer):
+    INT = int
+    NUMBER = float
+    def NAME(self, name):
+        return lookup_dict.get(name, name)
+
+
+T(visit_tokens=True).transform(tree)
+```
+
+
+### v_args
+
+`v_args` is a decorator.
+
+By default, callback methods of transformers/visitors accept one argument: a list of the node's children. `v_args` can modify this behavior.
+
+When used on a transformer/visitor class definition, it applies to all the callback methods inside it.
+
+`v_args` accepts one of three flags:
+
+- `inline` - Children are provided as `*args` instead of a list argument (not recommended for very long lists).
+- `meta` - Provides two arguments: `children` and `meta` (instead of just the first)
+- `tree` - Provides the entire tree as the argument, instead of the children.
+
+Examples:
+
+```python
+@v_args(inline=True)
+class SolveArith(Transformer):
+    def add(self, left, right):
+        return left + right
+
+
+class ReverseNotation(Transformer_InPlace):
+    @v_args(tree=True):
+    def tree_node(self, tree):
+        tree.children = tree.children[::-1]
+```
+
+### Discard
+
+When raising the `Discard` exception in a transformer callback, that node is discarded and won't appear in the parent.
+
+
diff --git a/lark/__init__.py b/lark/__init__.py
index 2b75d7a..0906eb7 100644
--- a/lark/__init__.py
+++ b/lark/__init__.py
@@ -5,4 +5,4 @@ from .exceptions import ParseError, LexError, GrammarError, UnexpectedToken, Une
 from .lexer import Token
 from .lark import Lark
 
-__version__ = "0.7.4"
+__version__ = "0.8.0rc1"
diff --git a/lark/exceptions.py b/lark/exceptions.py
index 4207589..28f1b4b 100644
--- a/lark/exceptions.py
+++ b/lark/exceptions.py
@@ -13,6 +13,14 @@ class ParseError(LarkError):
 class LexError(LarkError):
     pass
 
+class UnexpectedEOF(ParseError):
+    def __init__(self, expected):
+        self.expected = expected
+
+        message = ("Unexpected end-of-input. Expected one of: \n\t* %s\n" % '\n\t* '.join(x.name for x in self.expected))
+        super(UnexpectedEOF, self).__init__(message)
+
+
 class UnexpectedInput(LarkError):
     pos_in_stream = None
 
diff --git a/lark/lark.py b/lark/lark.py
index ae71d56..47c6fba 100644
--- a/lark/lark.py
+++ b/lark/lark.py
@@ -69,6 +69,7 @@ class LarkOptions(Serialize):
         'propagate_positions': False,
         'lexer_callbacks': {},
         'maybe_placeholders': False,
+        'edit_terminals': None,
     }
 
     def __init__(self, options_dict):
@@ -85,7 +86,7 @@ class LarkOptions(Serialize):
 
             options[name] = value
 
-        if isinstance(options['start'], str):
+        if isinstance(options['start'], STRING_TYPE):
             options['start'] = [options['start']]
 
         self.__dict__['options'] = options
@@ -205,6 +206,10 @@ class Lark(Serialize):
         # Compile the EBNF grammar into BNF
         self.terminals, self.rules, self.ignore_tokens = self.grammar.compile(self.options.start)
 
+        if self.options.edit_terminals:
+            for t in self.terminals:
+                self.options.edit_terminals(t)
+
         self._terminals_dict = {t.name:t for t in self.terminals}
 
         # If the user asked to invert the priorities, negate them all here.
diff --git a/lark/lexer.py b/lark/lexer.py
index 9cd7adb..f57ae51 100644
--- a/lark/lexer.py
+++ b/lark/lexer.py
@@ -3,7 +3,7 @@
 import re
 
 from .utils import Str, classify, get_regexp_width, Py36, Serialize
-from .exceptions import UnexpectedCharacters, LexError
+from .exceptions import UnexpectedCharacters, LexError, UnexpectedToken
 
 ###{standalone
 
@@ -43,7 +43,7 @@ class PatternStr(Pattern):
     __serialize_fields__ = 'value', 'flags'
 
     type = "str"
-    
+
     def to_regexp(self):
         return self._get_flags(re.escape(self.value))
 
@@ -166,36 +166,33 @@ class _Lex:
 
         while line_ctr.char_pos < len(stream):
             lexer = self.lexer
-            for mre, type_from_index in lexer.mres:
-                m = mre.match(stream, line_ctr.char_pos)
-                if not m:
-                    continue
-
-                t = None
-                value = m.group(0)
-                type_ = type_from_index[m.lastindex]
-                if type_ not in ignore_types:
-                    t = Token(type_, value, line_ctr.char_pos, line_ctr.line, line_ctr.column)
-                    if t.type in lexer.callback:
-                        t = lexer.callback[t.type](t)
-                        if not isinstance(t, Token):
-                            raise ValueError("Callbacks must return a token (returned %r)" % t)
-                    last_token = t
-                    yield t
-                else:
-                    if type_ in lexer.callback:
-                        t = Token(type_, value, line_ctr.char_pos, line_ctr.line, line_ctr.column)
-                        lexer.callback[type_](t)
-
-                line_ctr.feed(value, type_ in newline_types)
-                if t:
-                    t.end_line = line_ctr.line
-                    t.end_column = line_ctr.column
+            res = lexer.match(stream, line_ctr.char_pos)
+            if not res:
+                allowed = {v for m, tfi in lexer.mres for v in tfi.values()} - ignore_types
+                if not allowed:
+                    allowed = {"<END-OF-FILE>"}
+                raise UnexpectedCharacters(stream, line_ctr.char_pos, line_ctr.line, line_ctr.column, allowed=allowed, state=self.state, token_history=last_token and [last_token])
 
-                break
+            value, type_ = res
+
+            t = None
+            if type_ not in ignore_types:
+                t = Token(type_, value, line_ctr.char_pos, line_ctr.line, line_ctr.column)
+                if t.type in lexer.callback:
+                    t = lexer.callback[t.type](t)
+                    if not isinstance(t, Token):
+                        raise ValueError("Callbacks must return a token (returned %r)" % t)
+                last_token = t
+                yield t
             else:
-                allowed = {v for m, tfi in lexer.mres for v in tfi.values()}
-                raise UnexpectedCharacters(stream, line_ctr.char_pos, line_ctr.line, line_ctr.column, allowed=allowed, state=self.state, token_history=last_token and [last_token])
+                if type_ in lexer.callback:
+                    t = Token(type_, value, line_ctr.char_pos, line_ctr.line, line_ctr.column)
+                    lexer.callback[type_](t)
+
+            line_ctr.feed(value, type_ in newline_types)
+            if t:
+                t.end_line = line_ctr.line
+                t.end_column = line_ctr.column
 
 
 class UnlessCallback:
@@ -330,6 +327,11 @@ class TraditionalLexer(Lexer):
 
         self.mres = build_mres(terminals)
 
+    def match(self, stream, pos):
+        for mre, type_from_index in self.mres:
+            m = mre.match(stream, pos)
+            if m:
+                return m.group(0), type_from_index[m.lastindex]
 
     def lex(self, stream):
         return _Lex(self).lex(stream, self.newline_types, self.ignore_types)
@@ -367,9 +369,21 @@ class ContextualLexer(Lexer):
 
     def lex(self, stream):
         l = _Lex(self.lexers[self.parser_state], self.parser_state)
-        for x in l.lex(stream, self.root_lexer.newline_types, self.root_lexer.ignore_types):
-            yield x
-            l.lexer = self.lexers[self.parser_state]
-            l.state = self.parser_state
+        try:
+            for x in l.lex(stream, self.root_lexer.newline_types, self.root_lexer.ignore_types):
+                yield x
+                l.lexer = self.lexers[self.parser_state]
+                l.state = self.parser_state
+        except UnexpectedCharacters as e:
+            # In the contextual lexer, UnexpectedCharacters can mean that the terminal is defined,
+            # but not in the current context.
+            # This tests the input against the global context, to provide a nicer error.
+            root_match = self.root_lexer.match(stream, e.pos_in_stream)
+            if not root_match:
+                raise
+
+            value, type_ = root_match
+            t = Token(type_, value, e.pos_in_stream, e.line, e.column)
+            raise UnexpectedToken(t, e.allowed, state=e.state)
 
 ###}
diff --git a/lark/load_grammar.py b/lark/load_grammar.py
index 90911fd..a65ca1e 100644
--- a/lark/load_grammar.py
+++ b/lark/load_grammar.py
@@ -479,7 +479,7 @@ class Grammar:
         # ===================
 
         # Convert terminal-trees to strings/regexps
-        transformer = PrepareLiterals() * TerminalTreeToPattern()
+
         for name, (term_tree, priority) in term_defs:
             if term_tree is None:  # Terminal added through %declare
                 continue
@@ -487,7 +487,8 @@ class Grammar:
             if len(expansions) == 1 and not expansions[0].children:
                 raise GrammarError("Terminals cannot be empty (%s)" % name)
 
-        terminals = [TerminalDef(name, transformer.transform(term_tree), priority)
+        transformer = PrepareLiterals() * TerminalTreeToPattern()
+        terminals = [TerminalDef(name, transformer.transform( term_tree ), priority)
                   for name, (term_tree, priority) in term_defs if term_tree]
 
         # =================
@@ -638,11 +639,10 @@ def import_from_grammar_into_namespace(grammar, namespace, aliases):
 
 
 def resolve_term_references(term_defs):
-    # TODO Cycles detection
     # TODO Solve with transitive closure (maybe)
 
-    token_dict = {k:t for k, (t,_p) in term_defs}
-    assert len(token_dict) == len(term_defs), "Same name defined twice?"
+    term_dict = {k:t for k, (t,_p) in term_defs}
+    assert len(term_dict) == len(term_defs), "Same name defined twice?"
 
     while True:
         changed = False
@@ -655,11 +655,21 @@ def resolve_term_references(term_defs):
                     if item.type == 'RULE':
                         raise GrammarError("Rules aren't allowed inside terminals (%s in %s)" % (item, name))
                     if item.type == 'TERMINAL':
-                        exp.children[0] = token_dict[item]
+                        term_value = term_dict[item]
+                        assert term_value is not None
+                        exp.children[0] = term_value
                         changed = True
         if not changed:
             break
 
+    for name, term in term_dict.items():
+        if term:    # Not just declared
+            for child in term.children:
+                ids = [id(x) for x in child.iter_subtrees()]
+                if id(term) in ids:
+                    raise GrammarError("Recursion in terminal '%s' (recursion is only allowed in rules, not terminals)" % name)
+
+
 def options_from_rule(name, *x):
     if len(x) > 1:
         priority, expansions = x
diff --git a/lark/parsers/earley.py b/lark/parsers/earley.py
index a98be02..e18d26c 100644
--- a/lark/parsers/earley.py
+++ b/lark/parsers/earley.py
@@ -10,10 +10,11 @@ is better documented here:
     http://www.bramvandersanden.com/post/2014/06/shared-packed-parse-forest/
 """
 
+import logging
 from collections import deque
 
 from ..visitors import Transformer_InPlace, v_args
-from ..exceptions import ParseError, UnexpectedToken
+from ..exceptions import UnexpectedEOF, UnexpectedToken
 from .grammar_analysis import GrammarAnalyzer
 from ..grammar import NonTerminal
 from .earley_common import Item, TransitiveItem
@@ -45,12 +46,8 @@ class Parser:
             #  skip the extra tree walk. We'll also skip this if the user just didn't specify priorities
             #  on any rules.
             if self.forest_sum_visitor is None and rule.options and rule.options.priority is not None:
-                self.forest_sum_visitor = ForestSumVisitor()
+                self.forest_sum_visitor = ForestSumVisitor
 
-        if resolve_ambiguity:
-            self.forest_tree_visitor = ForestToTreeVisitor(self.callbacks, self.forest_sum_visitor)
-        else:
-            self.forest_tree_visitor = ForestToAmbiguousTreeVisitor(self.callbacks, self.forest_sum_visitor)
         self.term_matcher = term_matcher
 
 
@@ -273,6 +270,7 @@ class Parser:
 
         ## Column is now the final column in the parse.
         assert i == len(columns)-1
+        return to_scan
 
     def parse(self, stream, start):
         assert start, start
@@ -291,7 +289,7 @@ class Parser:
             else:
                 columns[0].add(item)
 
-        self._parse(stream, columns, to_scan, start_symbol)
+        to_scan = self._parse(stream, columns, to_scan, start_symbol)
 
         # If the parse was successful, the start
         # symbol should have been completed in the last step of the Earley cycle, and will be in
@@ -299,18 +297,25 @@ class Parser:
         solutions = [n.node for n in columns[-1] if n.is_complete and n.node is not None and n.s == start_symbol and n.start == 0]
         if self.debug:
             from .earley_forest import ForestToPyDotVisitor
-            debug_walker = ForestToPyDotVisitor()
-            debug_walker.visit(solutions[0], "sppf.png")
+            try:
+                debug_walker = ForestToPyDotVisitor()
+            except ImportError:
+                logging.warning("Cannot find dependency 'pydot', will not generate sppf debug image")
+            else:
+                debug_walker.visit(solutions[0], "sppf.png")
+
 
         if not solutions:
             expected_tokens = [t.expect for t in to_scan]
-            # raise ParseError('Incomplete parse: Could not find a solution to input')
-            raise ParseError('Unexpected end of input! Expecting a terminal of: %s' % expected_tokens)
+            raise UnexpectedEOF(expected_tokens)
         elif len(solutions) > 1:
             assert False, 'Earley should not generate multiple start symbol items!'
 
         # Perform our SPPF -> AST conversion using the right ForestVisitor.
-        return self.forest_tree_visitor.visit(solutions[0])
+        forest_tree_visitor_cls = ForestToTreeVisitor if self.resolve_ambiguity else ForestToAmbiguousTreeVisitor
+        forest_tree_visitor = forest_tree_visitor_cls(self.callbacks, self.forest_sum_visitor and self.forest_sum_visitor())
+
+        return forest_tree_visitor.visit(solutions[0])
 
 
 class ApplyCallbacks(Transformer_InPlace):
diff --git a/lark/parsers/xearley.py b/lark/parsers/xearley.py
index 3898d6a..f32d0d1 100644
--- a/lark/parsers/xearley.py
+++ b/lark/parsers/xearley.py
@@ -146,4 +146,5 @@ class Parser(BaseParser):
         self.predict_and_complete(i, to_scan, columns, transitives)
 
         ## Column is now the final column in the parse.
-        assert i == len(columns)-1
\ No newline at end of file
+        assert i == len(columns)-1
+        return to_scan
\ No newline at end of file
diff --git a/lark/visitors.py b/lark/visitors.py
index 4a0f639..c6e4f6b 100644
--- a/lark/visitors.py
+++ b/lark/visitors.py
@@ -3,6 +3,7 @@ from functools import wraps
 from .utils import smart_decorator
 from .tree import Tree
 from .exceptions import VisitError, GrammarError
+from .lexer import Token
 
 ###{standalone
 from inspect import getmembers, getmro
@@ -21,6 +22,10 @@ class Transformer:
     Can be used to implement map or reduce.
     """
 
+    __visit_tokens__ = False   # For backwards compatibility
+    def __init__(self,  visit_tokens=False):
+        self.__visit_tokens__ = visit_tokens
+
     def _call_userfunc(self, tree, new_children=None):
         # Assumes tree is already transformed
         children = new_children if new_children is not None else tree.children
@@ -45,10 +50,29 @@ class Transformer:
             except Exception as e:
                 raise VisitError(tree, e)
 
+    def _call_userfunc_token(self, token):
+        try:
+            f = getattr(self, token.type)
+        except AttributeError:
+            return self.__default_token__(token)
+        else:
+            try:
+                return f(token)
+            except (GrammarError, Discard):
+                raise
+            except Exception as e:
+                raise VisitError(token, e)
+
+
     def _transform_children(self, children):
         for c in children:
             try:
-                yield self._transform_tree(c) if isinstance(c, Tree) else c
+                if isinstance(c, Tree):
+                    yield self._transform_tree(c)
+                elif self.__visit_tokens__ and isinstance(c, Token):
+                    yield self._call_userfunc_token(c)
+                else:
+                    yield c
             except Discard:
                 pass
 
@@ -66,6 +90,11 @@ class Transformer:
         "Default operation on tree (for override)"
         return Tree(data, children, meta)
 
+    def __default_token__(self, token):
+        "Default operation on token (for override)"
+        return token
+
+
     @classmethod
     def _apply_decorator(cls, decorator, **kwargs):
         mro = getmro(cls)
@@ -157,6 +186,11 @@ class Visitor(VisitorBase):
             self._call_userfunc(subtree)
         return tree
 
+    def visit_topdown(self,tree):
+        for subtree in tree.iter_subtrees_topdown():
+            self._call_userfunc(subtree)
+        return tree        
+
 class Visitor_Recursive(VisitorBase):
     """Bottom-up visitor, recursive
 
@@ -169,8 +203,16 @@ class Visitor_Recursive(VisitorBase):
             if isinstance(child, Tree):
                 self.visit(child)
 
-        f = getattr(self, tree.data, self.__default__)
-        f(tree)
+        self._call_userfunc(tree)
+        return tree
+
+    def visit_topdown(self,tree):
+        self._call_userfunc(tree)
+
+        for child in tree.children:
+            if isinstance(child, Tree):
+                self.visit_topdown(child)
+        
         return tree
 
 
diff --git a/mkdocs.yml b/mkdocs.yml
index 63bdd61..f5b0d1d 100644
--- a/mkdocs.yml
+++ b/mkdocs.yml
@@ -9,5 +9,6 @@ pages:
     - How To Develop (Guide): how_to_develop.md
     - Grammar Reference: grammar.md
     - Tree Construction Reference: tree_construction.md
+    - Visitors and Transformers: visitors.md
     - Classes Reference: classes.md
     - Recipes: recipes.md
diff --git a/tests/__main__.py b/tests/__main__.py
index 4762773..901f101 100644
--- a/tests/__main__.py
+++ b/tests/__main__.py
@@ -10,7 +10,7 @@ from .test_reconstructor import TestReconstructor
 try:
     from .test_nearley.test_nearley import TestNearley
 except ImportError:
-    pass
+    logging.warn("Warning: Skipping tests for Nearley (js2py required)")
 
 # from .test_selectors import TestSelectors
 # from .test_grammars import TestPythonG, TestConfigG
diff --git a/tests/test_nearley/test_nearley.py b/tests/test_nearley/test_nearley.py
index 721db1d..647f489 100644
--- a/tests/test_nearley/test_nearley.py
+++ b/tests/test_nearley/test_nearley.py
@@ -15,9 +15,12 @@ NEARLEY_PATH = os.path.join(TEST_PATH, 'nearley')
 BUILTIN_PATH = os.path.join(NEARLEY_PATH, 'builtin')
 
 if not os.path.exists(NEARLEY_PATH):
-    print("Skipping Nearley tests!")
+    logging.warn("Nearley not installed. Skipping Nearley tests!")
     raise ImportError("Skipping Nearley tests!")
 
+import js2py    # Ensures that js2py exists, to avoid failing tests
+
+
 class TestNearley(unittest.TestCase):
     def test_css(self):
         fn = os.path.join(NEARLEY_PATH, 'examples/csscolor.ne')
diff --git a/tests/test_parser.py b/tests/test_parser.py
index 4db5ce9..e9d46e5 100644
--- a/tests/test_parser.py
+++ b/tests/test_parser.py
@@ -94,6 +94,24 @@ class TestParsers(unittest.TestCase):
         r = g.parse('xx')
         self.assertEqual( r.children[0].data, "c" )
 
+    def test_visit_tokens(self):
+        class T(Transformer):
+            def a(self, children):
+                return children[0] + "!"
+            def A(self, tok):
+                return tok.upper()
+
+        # Test regular
+        g = Lark("""start: a
+                    a : A
+                    A: "x"
+                 """, parser='lalr')
+        r = T().transform(g.parse("x"))
+        self.assertEqual( r.children, ["x!"] )
+        r = T(True).transform(g.parse("x"))
+        self.assertEqual( r.children, ["X!"] )
+
+
     def test_embedded_transformer(self):
         class T(Transformer):
             def a(self, children):
diff --git a/tests/test_trees.py b/tests/test_trees.py
index 4216bd6..edd2a8b 100644
--- a/tests/test_trees.py
+++ b/tests/test_trees.py
@@ -7,7 +7,7 @@ import pickle
 import functools
 
 from lark.tree import Tree
-from lark.visitors import Transformer, Interpreter, visit_children_decor, v_args, Discard
+from lark.visitors import Visitor, Visitor_Recursive, Transformer, Interpreter, visit_children_decor, v_args, Discard
 
 
 class TestTrees(TestCase):
@@ -34,6 +34,43 @@ class TestTrees(TestCase):
         nodes = list(self.tree1.iter_subtrees_topdown())
         self.assertEqual(nodes, expected)
 
+    def test_visitor(self):
+        class Visitor1(Visitor):
+            def __init__(self):
+                self.nodes=[]
+
+            def __default__(self,tree):
+                self.nodes.append(tree)
+        class Visitor1_Recursive(Visitor_Recursive):
+            def __init__(self):
+                self.nodes=[]
+
+            def __default__(self,tree):
+                self.nodes.append(tree)
+
+        visitor1=Visitor1()
+        visitor1_recursive=Visitor1_Recursive()
+
+        expected_top_down = [Tree('a', [Tree('b', 'x'), Tree('c', 'y'), Tree('d', 'z')]),
+                    Tree('b', 'x'), Tree('c', 'y'), Tree('d', 'z')]
+        expected_botton_up= [Tree('b', 'x'), Tree('c', 'y'), Tree('d', 'z'),
+                    Tree('a', [Tree('b', 'x'), Tree('c', 'y'), Tree('d', 'z')])]
+
+        visitor1.visit(self.tree1)
+        self.assertEqual(visitor1.nodes,expected_botton_up)
+
+        visitor1_recursive.visit(self.tree1)
+        self.assertEqual(visitor1_recursive.nodes,expected_botton_up)
+
+        visitor1.nodes=[]
+        visitor1_recursive.nodes=[]
+
+        visitor1.visit_topdown(self.tree1)
+        self.assertEqual(visitor1.nodes,expected_top_down)
+
+        visitor1_recursive.visit_topdown(self.tree1)
+        self.assertEqual(visitor1_recursive.nodes,expected_top_down)
+
     def test_interp(self):
         t = Tree('a', [Tree('b', []), Tree('c', []), 'd'])