Node:Lexical analyzer, Next:Lexer/parser, Previous:Preprocessor, Up:Internals
The source file gpc-lex.c
contains the so-called
lexical analyzer of the GNU Pascal compiler. (For those of
you who know flex
: This file was not created using
flex
but is maintained manually.) This very-first stage of
the compiler (after the preprocessor which is a separate executable)
is responsible for reading what you have written and dividing it
into tokens, the "atoms" of each computer language. The
source gpc-lex.c
essentially contains one large function,
yylex()
.
Here is, for example, where the real number 3.14
and the
subrange 3..14
are distinguished, and where Borland-style
character constants like #13
and ^M
are recognized.
This is not always a trivial task, for example look at the following
type declaration:
type X = ^Y; Y = packed array [^A .. ^B] of Char; Z = ^A .. ^Z;
If you wish to know how GPC distinguishes the pointer forward declaration
^Y
and the subrange ^A..^Z
, see gpc-lex.c
, function
yylex()
, case '^':
in the big switch
statement.
There are several situation where GPC's lexical analzyer becomes
context-sensitive. One is described above, another example is the
token protected
, a reserved word in ISO 10206 Extended
Pascal, but an ordinary identifier in ISO 7185 Pascal. It appears in
parameter lists
procedure foo (protected bar: Integer);
and says that the parameter bar
must not be changed inside
the body of the procedure.
OTOH, if you write a valid ISO 7185 Pascal program, you can declare
a parameter protected
:
procedure foo (protected, bar: Integer);
Here both standards contradict each other. GPC solves this problem
by checking explicitly for "protected" in the lexical analyzer: If
a comma or a colon follows, this is an ordinary identifier,
otherwise it's a reserved word. Having this, GPC even understands
procedure foo (protected protected: Integer);
without losing the special meaning of protected
as a reserved word.
The responsible code is in gpc-lex.c
- look out for PROTECTED
.
If you ever encouter a bug with the lexical analyzer - now you know where to hunt for it.