Sorry this is not very formal. I'm not a big fan of lexical scanner generators like LEX. I find it easier just to hand write a lexer, especially when you have a lot of states.

NUMBER: any combination of the digits 0-9 the decimal point
COMMENT: any characters between a '#' and a '\n'
INDENT: produced if a newline has just occured and there are more whitespace
characters (tabs and spaces) than there were immediately after the previous newline
DEDENT: produced if a newline has just occured and there are fewer whitespace characters than there were immediately after the previous newline. This is also produced before an eof when the eof is encountered after one or more lines have been indented
EOL: a single '\n', or two newlines with any number of whitespace characters between them. This is also produced before an eof, even if there is no '\n', to signify the end of the line
RULE: any characters between double quotes
STRING: any characters between single quotes
ASSEMBLY: any characters between '{' and '}'
ID: a sequence of alphabetic characters and whitespace in which each group of consecutive alphabetic characters begins with an uppercase letter
KEYWORD: sequence of alphabetic characters that does not match the above tokens
EOF: produced when there are no more characters in the file


back to home

Copyright 2002 The Newspeak Programming Language Project