Node:Language definition, Next:, Previous:Lexer/parser, Up:Internals



Language Definition: GPC's Parser

The file parse.y contains the "bison" source code of GNU Pascal's parser. This stage of the compilation analyzes and checks the syntax of your Pascal program, and it generates an intermediate, language-independent code which is then passed to the GNU back-end.

The bison language essentially is a machine-readable form of the Backus-Naur Form, the symbolic notation of grammars of computer languages. "Syntax diagrams" are a graphical variant of the Backus-Naur Form.

For details about the "bison" language, see the Bison manual. A short overview how to pick up some information you might need for programming follows.

Suppose you have forgotten how a variable is declared in Pascal. After some searching in parse.y you have found the following:

/* variable declaration part */

variable_declaration_part:
    LEX_VAR variable_declaration_list semi
  | LEX_VAR semi
      { error ("missing variable declaration"); }
  ;

variable_declaration_list:
    variable_declaration
  | variable_declaration_list semi variable_declaration
      { yyerrok; }
  | error
  | variable_declaration_list error variable_declaration
      {
        error ("missing semicolon");
        yyerrok;
      }
  | variable_declaration_list semi error
  ;

Translated into English, this means: "The variable declaration part consists of the reserved word (lexical token) var followed by a `variable declaration list' and a semicolon. A semicolon immediately following var is an error. A `variable declaration list' in turn consists of one or more `variable declarations', separated by semicolons." (The latter explanation requires that you understand the recursive nature of the definition of variable_declaration_list.)

Now we can go on and search for variable_declaration.

variable_declaration:
    id_list
      {
        [...]
      }
    enable_caret ':' optional_qualifier_list type_denoter
      {
        [...]
      }
    absolute_or_value_specification
      {
        [...]
      }
  ;

(The [...] are placeholders for some C statements which aren't important for understanding GPC's grammar.)

From this you can look up that a variable declaration in GNU Pascal consists of an "id list", followed by "enable_caret" (whatever that means), a colon, an "optional qualifier list", a "type denoter", and an "absolute or value specification". Some of these parts are easy to understand, the others you can look up from parse.y. Remember that the reserved word var precedes all this, and a semicolon follows all this.

Now you know the procedure how to get the exact grammar of the GNU Pascal language from the source.

The C statements, not shown above, are in some sense the most important part of the bison source, because they are responsible for the generation of the intermediate code of the GNU Pascal front-end, the so-called tree nodes (which are used to represent most things in the compiler). For instance, the C code in "type denoter" returns (assigns to $$) information about the type in a variable of type tree.

The "variable declaration" gets this and other information in the numbered arguments ($1 etc.) and passes it to some C functions declared in the other source files. Generally, those functions do the real work, while the main job of the C statements in the parser is to call them and pass their arguments around.

This, the parser, is the place where it becomes Pascal.