Node: attribute as a weak keyword, Next: , Up: Parsing keywords



attribute as a weak keyword

Note that in the following we use the spelling attribute when referring to the directive and Attribute for an identifier. This is according to the GPCS and might make the following text clearer. However, it cannot be a criterion for resolving the conflict since the compiler must treat both spellings equally. The same applies, of course, to the line-breaks and white-space used here for readability.

Making attribute a weak keyword leads to a S/R conflict in variable declarations (whereas routine declarations go without conflicts). Consider this case:

     var
       a: Integer; attribute (...)

vs.

     var
       a: Integer;
       Attribute: ...

After reading the ;, the parser must decide whether to shift it, or to reduce to a variable declaration. But the next token attribute doesn't decide it, and bison can only look ahead one token.

The following token would resolve the problem, since the directive attribute is always followed by ( whereas an identifier in a variable declaration can be followed by , or :, but never (.

More generally, an identifier in an id_list in the parser can never be followed by ( (while identifiers in other contexts can be, e.g. in function calls). This must be carefully checked manually through the whole grammar!

Thus, the solution consists of two steps. Firstly, the lexer does the additional look-ahead that bison can't do. When it reads the word attribute (and it is not disabled by dialect options or by the user or shadowed by some declaration), then if the next token is not (, it can only be an identifier, so the lexer returns LEX_ID. If the next token is (, the lexer returns p_attribute.

Lexer look-ahead is not really nice, either, e.g. because it increases the “shift” of compiler directives. At least, we only have to read ahead two characters plus preceding white-space (two because of (.), and not an actual token – the latter would add additiional complications of saving and restoring lexer semantic values and the state of lexer/parser interrelation variables such as lex_const_equal, and then either lex the token again later or handle the cases where the parser modifies these variables in between. This would get really messy.

Secondly, the parser accepts p_attribute as an identifier except in an id_list. To achieve this, the nonterminal new_identifier_limited is used within id_list.

Note: Using new_identifier_limited does not mean that Attribute can't be used as an identifier in this place. Instead, this nonterminal can never be followed by (, so the lexer will have turned Attribute into a LEX_ID token already.

Actually, that's not all: In a constant_definition, the conflict is not against id_list, but against a simple new_identifier. But we can just use new_identifier_limited instead in the constant_definition rule.

This finally solves all conflicts with attribute. fjf792*.pas are test programs for these cases.