This product includes software developed by the University of California, Berkeley and its contributors.
Copyright (c) 1990 The Regents of the University of California. All rights reserved.
This code is derived from software contributed to Berkeley by Vern Paxson.
The United States Government has rights in this work pursuant to contract no. DE-AC03-76SF00098 between the United States Department of Energy and the University of California.
Redistribution and use in source and binary forms are permitted provided that: (1) source distributions retain this entire copyright notice and comment, and (2) distributions including binaries display the following acknowledgement: "This product includes software developed by the University of California, Berkeley and its contributors" in the documentation or other materials provided with the distribution and in all advertising materials mentioning features or use of this software. Neither the name of the University nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
flex
, with Examples
flex
is a tool for generating scanners: programs which recognize
lexical patterns in text. flex
reads the given input files (or
its standard input if no file names are given) for a description of the
scanner to generate. The description is in the form of pairs of regular
expressions and C code, called rules. flex
generates as
output a C source file, `lex.yy.c', which defines a routine
yylex
. Compile and link this file with the `-lfl' library to
produce an executable. When the executable runs, it analyzes its input
for occurrences of the regular expressions. Whenever it finds one, it
executes the corresponding C code.
Some simple examples follow, to give you the flavor of using
flex
.
The following flex
input specifies a scanner which,
whenever it encounters the string `username', will replace it
with the user's login name:
%% username printf( "%s", getlogin() );
By default, any text not matched by a flex
scanner is copied to
the output, so the net effect of this scanner is to copy its input file
to its output with each occurrence of `username' expanded. In this
input, there is just one rule. `username' is the pattern and the
printf
is the action. The `%%' marks the beginning of the
rules.
Here's another simple example:
int num_lines = 0, num_chars = 0; %% \n ++num_lines; ++num_chars; . ++num_chars; %% main() { yylex(); printf( "# of lines = %d, # of chars = %d\n", num_lines, num_chars ); }
This scanner counts the number of characters and the number
of lines in its input (it produces no output other than the
final report on the counts). The first line declares two
globals, num_lines
and num_chars
, which are accessible
both inside yylex
and in the main
routine declared after
the second `%%'. There are two rules, one which matches a
newline (`\n') and increments both the line count and the
character count, and one which matches any character other
than a newline (indicated by the `.' regular expression).
A somewhat more complicated example:
/* scanner for a toy Pascal-like language */ %{ /* need this for the call to atof() below */ #include <math.h> %} DIGIT [0-9] ID [a-z][a-z0-9]* %% {DIGIT}+ { printf( "An integer: %s (%d)\n", yytext, atoi( yytext ) ); } {DIGIT}+"."{DIGIT}* { printf( "A float: %s (%g)\n", yytext, atof( yytext ) ); } if|then|begin|end|procedure|function { printf( "A keyword: %s\n", yytext ); } {ID} printf( "An identifier: %s\n", yytext ); "+"|"-"|"*"|"/" printf( "An operator: %s\n", yytext ); "{"[^}\n]*"}" /* eat up one-line comments */ [ \t\n]+ /* eat up whitespace */ . printf( "Unrecognized character: %s\n", yytext ); %% main( argc, argv ) int argc; char **argv; { ++argv, --argc; /* skip over program name */ if ( argc > 0 ) yyin = fopen( argv[0], "r" ); else yyin = stdin; yylex(); }
This is the beginnings of a simple scanner for a language like Pascal. It identifies different types of tokens and reports on what it has seen.
The details of this example are explained in the following chapters.