Package: AwkProc
The package AwkProc is meant to provide an AWK-like programming
environment to Tcl. It is an all-Tcl package that simulates AWK's
pattern-action model.
Like AWK it has special patterns to mark the beginning and end
of a file, to process several files one after another and so on.
It is, however, not a complete emulation of AWK.
This document describes what it can and can not do, or at least,
not without additional programming. Since it is set up as an
ordinary Tcl package, anything that is lacking in capability
can in principle be added by proper Tcl code.
The set-up of this document is as follows:
This document describes AwkProc, version 0.2, june 2001.
Usage of AwkProc is free, as long as you acknowledge the
author, Arjen Markus (e-mail: arjen.markus@wldelft.nl).
There is no guarantee nor claim that the results are accurate.
The AwkProc package defines the following public procedures:
-
Pattern patt action
Define a simple pattern with some action:
patt - A literal string to search in the input line
action - Action to take if the pattern matches
-
RegPattern patt action
Define a regular expression with some action
patt - Regular expression to mathc the input line with
action - Action to take if the pattern matches
-
Content subcommand args
Manipulate the status information (the "content")
The subcommand can be any of the following:
-
init - Initialise the content
-
initProc - Define the initialisation procedure
-
export - Execute the export procedure
-
exportProc - Define the export procedure
-
set - Set a variable in the state array
-
append - Append text to a variable in the state array
-
lappend - Append text to a list variable in the state array
-
get - Return the current value for the variable
The state information is actually an array within
the namespace.
-
DefaultAction action
Define a default action per line. Executed after the
specific patterns have or have not matched.
action - Action to take for each line
-
BeginFile action
Define an action for the beginning of a file
action - Action to take
-
EndFile action
Define an action for the end of a file
action - Action to take
-
ProcessFiles args
Process the given files, using the given patterns
args - List of files to be processed
-
GetNextLine
Read the next line of input. The pattern matching
will continue with the new input
(no arguments)
-
SkipPatterns
Skip the remaining patterns for this input line.
The current action is first completed, then the next input
line is read and the pattern matching starts all over.
Note:
The default action, if one is defined, is still executed.
(no arguments)
Pattern matching works as follows:
-
Each call to Pattern or RegPattern defines in fact a new small
procedure that gets executed for each input line.
-
These procedures are called in the order of their definition.
This may be important, because quite often state variables have
to be updated.
-
All procedures implementing the patterns have access to the
following (global) variables that are updated automatically:
-
LINE - the contents of the current
-
NL - the line count within the current file
-
NLT - the total line count
-
MATCH - the part of the input line that matched
the pattern (this is not currently set)
-
FILENAME - the name of the current file
Note:
As the global variables, especially LINE, are not protected, you can
use this fact to prepare the input for further processing. For instance:
#
# First convert the input to lower-case, and trim any blanks
# - this makes the pattern much simpler
#
RegPattern "." {
set LINE [string trim ]string tolower [$LINE]]
}
This section provides a concise comparison between the AwkProc package
and the AWK language as known from UNIX. As AwkProc is by no means meant
to be complete, there are quite a few limitations with respect to AWK.
However, as AwkProc is written in Tcl, users can take advantage of a
much more widely applicable scripting language.
Some obvious limitations of AwkProc:
-
AWK allows patterns like: "len > 70", AwkProc only allows literal
strings and regular expressions.
-
AWK automatically splits the input into fields (as defined by
the field separator). For performance reasons, AwkProc does not.
If individual fields are required within some action, use
[split] and list operations.
Some obvious advantages of AwkProc:
-
The scope of variables is well-defined (this is a rather messy
aspect of AWK).
-
The package is as portable as Tcl itself (AWK is not standard
on, say, the Windows platform).
-
The package can be used both as a standalone tool and as
part of a larger application (AWK is restricted to itself).
This section presents a small example that is intended to illustrate
the type of processing that can be done, rather than something
practical or complete.
The purpose is to:
-
Extract lines from a Tcl source file that are the headers of
procedures
-
To print them with the line numbers
-
To print the global variables used within the procedure
-
To count the total number of lines (between each procedure,
without regard for comment lines and such)
Extracting the procedure headers (that is, the procedure name and
arguments) is easy, as long as they appear on one line. In regular
Tcl this would read:
if { [regexp {^[ ]*proc } $LINE] } { ... }
Similarly, finding the keyword "global" is easy. Just replace "proc"
by "global".
Because we want to know how many lines of code (disregarding comments,
which only complicates matters) there are between the procedure
headers, we define a single parameter, anchor, which is the
line number of the last "proc" encountered.
So, we end up, together with some print statements and some calculations
with a script like this:
package require AwkProc
::AwkProc::BeginFile {
::AwkProc::Content set anchor 0
}
::AwkProc::RegPattern {^[ ]*proc } {
if { [::AwkProc::Content get anchor] > 0 } {
set numlines [expr $NL-[::AwkProc::Content get anchor]]
puts " (roughly $numlines lines of code)\n"
}
puts "[format "%5d: %s" $NL $LINE]"
::AwkProc::Content set anchor $NL
}
::AwkProc::RegPattern {^[ ]*global } {
puts " Global(s): [lrange $LINE 1 end]"
}
::AwkProc::EndFile {
::AwkProc::Content set anchor 0
}
::AwkProc::ProcessFiles $::args
The list below is essentially a to-do list:
-
Currently the exporting and importing of the public procedures
is not done properly.
Suggestion: Always use the namespace ::AwkProc:: as
a prefix for commands.
-
It is not possible specify options (like -nocase) for the
patterns.
Suggestion: See the example in the section
AN EXAMPLE
-
The updating of the line numbers should take place in GetNextLine,
rather than in ProcessFiles.
-
The variable MATCH is not set properly. It should record the part
of the input line that matches the current pattern.
-
The SkipPatterns procedure does not cause the default action to be
skipped.
-
It should be possible to "wrap" the GetNextLine procedure, so that
other sources than files can be used or special preparations can
be done more elegantly.