lex
and POSIX
flex
is a rewrite of the Unix tool lex
(the two
implementations do not share any code, though), with some extensions
and incompatibilities, both of which are of concern to those who wish to
write scanners acceptable to either implementation. At present, the
POSIX lex
draft is very close to the original lex
implementation, so some of these incompatibilities are also in
conflict with the POSIX draft. But the intent is that except as noted
below, flex
as it presently stands will ultimately be POSIX
conformant (i.e., that those areas of conflict with the POSIX draft will
be resolved in flex
's favor). Please bear in mind that all the
comments which follow are with regard to the POSIX draft standard of
Summer 1989, and not the final document (or subsequent drafts); they are
included so flex
users can be aware of the standardization issues
and those areas where flex
may in the near future undergo changes
incompatible with its current definition.
flex
is fully compatible with lex
with the following exceptions:
lex
scanner internal variable yylineno
is
not supported. It is difficult to support this option efficiently,
since it requires examining every character scanned and reexamining the
characters when the scanner backs up. Things get more complicated when
the end of buffer or file is reached or a NUL
is scanned (since
the scan must then be restarted with the proper line number count), or
the user uses the yyless
, unput
, or REJECT
actions,
or the multiple input buffer functions.
The fix is to add rules which, upon seeing a newline, increment
yylineno
. This is usually an easy process, though it can be a
drag if some of the patterns can match multiple newlines along with
other characters.
yylineno
is not part of the POSIX draft.
input
routine is not redefinable, though it may be called to
read characters following whatever has been matched by a rule. If
input
encounters an end-of-file the normal yywrap
processing is done. A "real" end-of-file is returned by input
as EOF
.
Input is instead controlled by redefining the YY_INPUT
macro.
The flex
restriction that input
cannot be redefined is in
accordance with the POSIX draft, but YY_INPUT
has not yet been
accepted into the draft (and probably won't; it looks like the draft
will simply not specify any way of controlling the scanner's input other
than by making an initial assignment to `yyin').
flex
scanners do not use stdio
for input. Because of
this, when writing an interactive scanner one must explicitly call
fflush
on the stream associated with the terminal after writing
out a prompt. With lex
such writes are automatically flushed
since lex
scanners use getchar
for their input. Also,
when writing interactive scanners with flex
, the `-I' flag
must be used.
flex
scanners are not as reentrant as lex
scanners. In
particular, if you have an interactive scanner and an
interrupt handler which long-jumps out of the scanner,
and the scanner is subsequently called again, you may
get the following message:
fatal flex scanner internal error--end of buffer missedTo reenter the scanner, first use
yyrestart( yyin );
output
is not supported. Output from the ECHO
macro is
done to the file-pointer yyout
(default stdout
).
The POSIX draft mentions that an output
routine exists but
currently gives no details as to what it does.
lex
does not support exclusive start conditions (`%x'),
though they are in the current POSIX draft.
flex
encloses them in
parentheses. With lex
, the following:
NAME [A-Z][A-Z0-9]* %% foo{NAME}? printf( "Found it\n" ); %%will not match the string `foo' because, when the macro is expanded, the rule is equivalent to `foo[A-Z][A-Z0-9]*?' and the precedence is such that the `?' is associated with `[A-Z0-9]*'. With
flex
, the rule will be expanded to
`foo([A-Z][A-Z0-9]*)?' and so the string `foo' will match.
Note that because of this, the `^', `$', `<s>',
`/', and `<<EOF>>' operators cannot be used in a flex
definition.
The POSIX draft interpretation is the same as in flex
.
lex
one can use `[^]]' but with flex
one must use `[^\]]'. The latter works with lex
, too.
yywrap
routine, you must include a
`#undef yywrap' in the definitions section (section 1). Note that
the `#undef' will have to be enclosed in `%{}'.
The POSIX draft specifies that yywrap
is a function, and this is
very unlikely to change; so flex
users are warned that
yywrap
is likely to be changed to a function in the near future.
unput
, yytext
and yyleng
are
undefined until the next token is matched. This is not the case with
lex
or the present POSIX draft.
lex
interprets `abc{1,3}' as "match one, two, or three
occurrences of `abc'," whereas flex
interprets it as
"match `ab' followed by one, two, or three occurrences of
`c'." The latter is in agreement with the current POSIX draft.
lex
interprets `^foo|bar' as "match either `foo' at the
beginning of a line, or `bar' anywhere", whereas flex
interprets it as "match either `foo' or `bar' if they
come at the beginning of a line". The latter is in
agreement with the current POSIX draft.
yytext
outside of the scanner source file, the
correct definition with flex
is `extern char *yytext' rather
than `extern char yytext[]'. This is contrary to the current POSIX
draft but a point on which flex
will not be changing, as the
array representation entails a serious performance penalty. It is
hoped that the POSIX draft will be amended to support the flex
variety of declaration (as this is a fairly painless change to require
of lex
users).
lex
to be stdin
;
flex
, on the other hand, initializes `yyin' to NULL
and then assigns it to stdin the first time the scanner is called,
providing `yyin' has not already been assigned to a non-NULL
value. The difference is subtle, but the net effect is that with
flex
scanners, `yyin' does not have a valid value until the
scanner has been called.
lex
are not required by flex
scanners; flex
ignores
them.
FLEX_SCANNER
is #define
'd so scanners may be
written for use with either flex
or lex
.
The following flex
features are not included in lex or the
POSIX draft standard:
yyterminate()
<<EOF>>
YY_DECL
#line
directives `%{}' around actionsyyrestart()
comments beginning with `#' (deprecated) multiple actions on a line
This last feature refers to the fact that with flex
you can put
multiple actions on the same line, separated with semicolons, while with
lex
, the following
foo handle_foo(); ++num_foos_seen;
is (rather surprisingly) truncated to
foo handle_foo();
flex
does not truncate the action. Actions that are not enclosed
in braces are simply terminated at the end of the line.