Bison 1.35

Node:Mystery Conflicts, Next:Stack Overflow, Previous:Reduce/Reduce, Up:Algorithm

Mysterious Reduce/Reduce Conflicts

Sometimes reduce/reduce conflicts can occur that don't look warranted. Here is an example:

%token ID

%%
def:    param_spec return_spec ','
        ;
param_spec:
             type
        |    name_list ':' type
        ;
return_spec:
             type
        |    name ':' type
        ;
type:        ID
        ;
name:        ID
        ;
name_list:
             name
        |    name ',' name_list
        ;

It would seem that this grammar can be parsed with only a single token of look-ahead: when a param_spec is being read, an ID is a name if a comma or colon follows, or a type if another ID follows. In other words, this grammar is LR(1).

However, Bison, like most parser generators, cannot actually handle all LR(1) grammars. In this grammar, two contexts, that after an ID at the beginning of a param_spec and likewise at the beginning of a return_spec, are similar enough that Bison assumes they are the same. They appear similar because the same set of rules would be active--the rule for reducing to a name and that for reducing to a type. Bison is unable to determine at that stage of processing that the rules would require different look-ahead tokens in the two contexts, so it makes a single parser state for them both. Combining the two contexts causes a conflict later. In parser terminology, this occurrence means that the grammar is not LALR(1).

In general, it is better to fix deficiencies than to document them. But this particular deficiency is intrinsically hard to fix; parser generators that can handle LR(1) grammars are hard to write and tend to produce parsers that are very large. In practice, Bison is more useful as it is now.

When the problem arises, you can often fix it by identifying the two parser states that are being confused, and adding something to make them look distinct. In the above example, adding one rule to return_spec as follows makes the problem go away:

%token BOGUS
...
%%
...
return_spec:
             type
        |    name ':' type
        /* This rule is never used.  */
        |    ID BOGUS
        ;

This corrects the problem because it introduces the possibility of an additional active rule in the context after the ID at the beginning of return_spec. This rule is not active in the corresponding context in a param_spec, so the two contexts receive distinct parser states. As long as the token BOGUS is never generated by yylex, the added rule cannot alter the way actual input is parsed.

In this particular example, there is another way to solve the problem: rewrite the rule for return_spec to use ID directly instead of via name. This also causes the two confusing contexts to have different sets of active rules, because the one for return_spec activates the altered rule for return_spec rather than the one for name.

param_spec:
             type
        |    name_list ':' type
        ;
return_spec:
             type
        |    ID ':' type
        ;