Sorry this is not very formal. I'm not a big fan of lexical scanner
generators like LEX. I find it easier just to hand write a lexer, especially
when you have a lot of states.
NUMBER: any combination of the digits 0-9 the decimal point
COMMENT: any characters between a '#' and a '\n'
INDENT: produced if a newline has just occured and there are more whitespace
characters (tabs and spaces) than there were immediately after the previous
newline
DEDENT: produced if a newline has just occured and there are fewer whitespace
characters than there were immediately after the previous newline. This is
also produced before an eof when the eof is encountered after one or more
lines have been indented
EOL: a single '\n', or two newlines with any number of whitespace
characters between them. This is also produced before an eof, even if there
is no '\n', to signify the end of the line
RULE: any characters between double quotes
STRING: any characters between single quotes
ASSEMBLY: any characters between '{' and '}'
ID: a sequence of alphabetic characters and whitespace in which each group
of consecutive alphabetic characters begins with an uppercase letter
KEYWORD: sequence of alphabetic characters that does not match the above
tokens
EOF: produced when there are no more characters in the file
back to home
Copyright 2002 The Newspeak Programming Language Project