The program flex has been around a very long time, and works quite well — indeed, you still see it used in various places, including compiling the Linux kernel. While there is some legitimate criticism for its use of global variables (and expectation that the user will also use these), it is fast and efficient.
I would like you to create a simple standalone flex program named step1-exercise.flex. I have a created a skeleton for you to use in this tar file, which also includes a Makefile: step1.tar. (If you are getting any undefined symbols, please try adding "-lfl" to the clang invocation to see if that helps.)
Currently, all the program step1-exercise is count lines in a file:
$ ./step1-exercise < input-01.lang There are 15 lines. $ ./step1-exercise < input-02.lang There are 20 lines.
We would like our step1-exercise to actually recognize the elements in our two test files:
$ ./step1-exercise < input-01.lang line 2: PROGRAM line 2: ID = 'MYPROG' line 3: LBRACE line 4: VARIABLES line 4: COLON line 5: VAR line 5: ID = 'VAR1' line 5: SEMICOLON line 7: FUNCTIONS line 7: COLON line 8: DEFINE line 8: ID = 'FUNC1' line 8: LPARENTHESIS line 8: VAR line 8: ID = 'X' line 8: RPARENTHESIS line 9: LBRACE line 10: IF line 10: LPARENTHESIS line 10: ID = 'X' line 10: RPARENTHESIS line 11: THEN line 12: ID = 'print' line 12: LPARENTHESIS line 12: ID = 'X' line 12: COMMA line 12: ID = 'X' line 12: RPARENTHESIS line 12: SEMICOLON line 13: ELSE line 15: RBRACE line 16: RBRACE $ ./step1-exercise < input-02.lang line 2: PROGRAM line 2: ID = 'MYPROG2' line 2: SEMICOLON line 4: VARIABLES line 5: LBRACE line 6: VAR line 6: ID = 'VAR1' line 6: SEMICOLON line 7: VAR line 7: ID = 'VAR2' line 7: SEMICOLON line 8: RBRACE line 11: FUNCTIONS line 12: LBRACE line 13: DEFINE line 13: ID = 'FUNC1' line 13: LPARENTHESIS line 13: VAR line 13: ID = 'X' line 13: RPARENTHESIS line 13: COLON line 13: ID = 'int' line 14: LBRACE line 15: IF line 15: LPARENTHESIS line 15: ID = 'X' line 15: RPARENTHESIS line 15: THEN line 15: ID = 'print' line 15: LPARENTHESIS line 15: ID = 'X' line 15: COMMA line 15: ID = 'X' line 15: RPARENTHESIS line 15: SEMICOLON line 16: ELSE line 16: ID = 'print' line 16: LPARENTHESIS line 16: ID = 'X' line 16: RPARENTHESIS line 16: SEMICOLON line 17: END line 18: RBRACE line 20: RBRACE
Here is a comprehensive list of possible tokens to recognize:
string token ====== ====== "program" PROGRAM "end" END "variables" VARIABLES "var" VAR "functions" FUNCTIONS "define" DEFINE "if" IF "then" THEN "else" ELSE "while" WHILE "," COMMA "(" LPARENTHESIS ")" RPARENTHESIS "{" LBRACE "}" RBRACE ":" COLON ";" SEMICOLON [a-zA-Z0-9]+ ID
(I removed the useless STATEMENTS token; sorry about including it.)
Also, make sure to set the variable yylval for your return value so that you can print the actual string found for your tokens identified as ID tokens.
Your rules should not do any output; instead, you will want to write a loop around using "yylex()" repeatedly to get the next token; your output should be done in that main loop rather than trying to have your rules do the output.
Submission: Please submit a new step1.tar file on Canvas that has the original contents modified to do the correct recognition of the tokens in the test input files. Your submission is due by 11:59pm on Sunday, September 22.