Node:Lexical analyzer, Next:, Previous:Preprocessor, Up:Internals



GPC's Lexical Analyzer

The source file gpc-lex.c contains the so-called lexical analyzer of the GNU Pascal compiler. (For those of you who know flex: This file was not created using flex but is maintained manually.) This very-first stage of the compiler (after the preprocessor which is a separate executable) is responsible for reading what you have written and dividing it into tokens, the "atoms" of each computer language. The source gpc-lex.c essentially contains one large function, yylex().

Here is, for example, where the real number 3.14 and the subrange 3..14 are distinguished, and where Borland-style character constants like #13 and ^M are recognized. This is not always a trivial task, for example look at the following type declaration:

type
  X = ^Y;
  Y = packed array [^A .. ^B] of Char;
  Z = ^A .. ^Z;

If you wish to know how GPC distinguishes the pointer forward declaration ^Y and the subrange ^A..^Z, see gpc-lex.c, function yylex(), case '^': in the big switch statement.

There are several situation where GPC's lexical analzyer becomes context-sensitive. One is described above, another example is the token protected, a reserved word in ISO 10206 Extended Pascal, but an ordinary identifier in ISO 7185 Pascal. It appears in parameter lists

procedure foo (protected bar: Integer);

and says that the parameter bar must not be changed inside the body of the procedure.

OTOH, if you write a valid ISO 7185 Pascal program, you can declare a parameter protected:

procedure foo (protected, bar: Integer);

Here both standards contradict each other. GPC solves this problem by checking explicitly for "protected" in the lexical analyzer: If a comma or a colon follows, this is an ordinary identifier, otherwise it's a reserved word. Having this, GPC even understands

procedure foo (protected protected: Integer);

without losing the special meaning of protected as a reserved word.

The responsible code is in gpc-lex.c - look out for PROTECTED.

If you ever encouter a bug with the lexical analyzer - now you know where to hunt for it.