Node:Mid-Rule Actions, Previous:Action Types, Up:Semantics
Occasionally it is useful to put an action in the middle of a rule. These actions are written just like usual end-of-rule actions, but they are executed before the parser even recognizes the following components.
A mid-rule action may refer to the components preceding it using
$n
, but it may not refer to subsequent components because
it is run before they are parsed.
The mid-rule action itself counts as one of the components of the rule.
This makes a difference when there is another action later in the same rule
(and usually there is another at the end): you have to count the actions
along with the symbols when working out which number n to use in
$n
.
The mid-rule action can also have a semantic value. The action can set
its value with an assignment to $$
, and actions later in the rule
can refer to the value using $n
. Since there is no symbol
to name the action, there is no way to declare a data type for the value
in advance, so you must use the $<...>n
construct to
specify a data type each time you refer to this value.
There is no way to set the value of the entire rule with a mid-rule
action, because assignments to $$
do not have that effect. The
only way to set the value for the entire rule is with an ordinary action
at the end of the rule.
Here is an example from a hypothetical compiler, handling a let
statement that looks like let (variable) statement
and
serves to create a variable named variable temporarily for the
duration of statement. To parse this construct, we must put
variable into the symbol table while statement is parsed, then
remove it afterward. Here is how it is done:
stmt: LET '(' var ')' { $<context>$ = push_context (); declare_variable ($3); } stmt { $$ = $6; pop_context ($<context>5); }
As soon as let (variable)
has been recognized, the first
action is run. It saves a copy of the current semantic context (the
list of accessible variables) as its semantic value, using alternative
context
in the data-type union. Then it calls
declare_variable
to add the new variable to that list. Once the
first action is finished, the embedded statement stmt
can be
parsed. Note that the mid-rule action is component number 5, so the
stmt
is component number 6.
After the embedded statement is parsed, its semantic value becomes the
value of the entire let
-statement. Then the semantic value from the
earlier action is used to restore the prior list of variables. This
removes the temporary let
-variable from the list so that it won't
appear to exist while the rest of the program is parsed.
Taking action before a rule is completely recognized often leads to
conflicts since the parser must commit to a parse in order to execute the
action. For example, the following two rules, without mid-rule actions,
can coexist in a working parser because the parser can shift the open-brace
token and look at what follows before deciding whether there is a
declaration or not:
compound: '{' declarations statements '}' | '{' statements '}' ;
But when we add a mid-rule action as follows, the rules become nonfunctional:
compound: { prepare_for_local_variables (); } '{' declarations statements '}' | '{' statements '}' ;
Now the parser is forced to decide whether to run the mid-rule action when it has read no farther than the open-brace. In other words, it must commit to using one rule or the other, without sufficient information to do it correctly. (The open-brace token is what is called the look-ahead token at this time, since the parser is still deciding what to do about it. See Look-Ahead Tokens.)
You might think that you could correct the problem by putting identical
actions into the two rules, like this:
compound: { prepare_for_local_variables (); } '{' declarations statements '}' | { prepare_for_local_variables (); } '{' statements '}' ;
But this does not help, because Bison does not realize that the two actions are identical. (Bison never tries to understand the C code in an action.)
If the grammar is such that a declaration can be distinguished from a
statement by the first token (which is true in C), then one solution which
does work is to put the action after the open-brace, like this:
compound: '{' { prepare_for_local_variables (); } declarations statements '}' | '{' statements '}' ;
Now the first token of the following declaration or statement, which would in any case tell Bison which rule to use, can still do so.
Another solution is to bury the action inside a nonterminal symbol which
serves as a subroutine:
subroutine: /* empty */ { prepare_for_local_variables (); } ; compound: subroutine '{' declarations statements '}' | subroutine '{' statements '}' ;
Now Bison can execute the action in the rule for subroutine
without
deciding which rule for compound
it will eventually use. Note that
the action is now at the end of its rule. Any mid-rule action can be
converted to an end-of-rule action in this way, and this is what Bison
actually does to implement mid-rule actions.