Chapter 2 - Programming Language Syntax

Specifying Syntax: Regular expressions and context-free grammars

Specifying Syntax: Regular expressions and context-free grammars

Tokens and regular expressions

Tokens and regular expressions

Tokens and regular expressions

$$number \rightarrow integer | real $$ $$integer \rightarrow digit\ digit* $$ $$real \rightarrow\ integer\ exponent\ |\ decimal\ (\ exponent\ |\ \epsilon\ )$$ $$decimal \rightarrow\ digit*\ (\ .\ digit\ |\ digit\ .\ )\ digit*$$ $$exponent \rightarrow\ (\ e\ |\ E\ )\ (\ +\ |\ -\ |\ \epsilon\ )\ integer$$ $$digit \rightarrow\ 1\ |\ 2\ |\ 3\ |\ 4\ |\ 5\ |\ 6\ |\ 7\ |\ 8\ |\ 9\ |\ 0$$

Tokens and regular expressions

Tokens and regular expressions

$$number \Rightarrow integer \Rightarrow digit\ digit*$$ $$\ \Rightarrow 1\ digit* \Rightarrow 1\ 3\ digit* \Rightarrow 1\ 3$$

Character sets and formatting issues

Character sets and formatting issues

Character sets and formatting issues

Other applications of regular expressions

Other applications of regular expressions

"a4b6" =~ /a([0-9])b([0-9])/

Other applications of regular expressions

$ perl
"a4b6" =~ /a([0-9])b([0-9])/ ;
print $1 . "\n";
print $2 . "\n";
4
6

Context-free grammars in Backus-Naur Form (BNF)

Context-free grammars in Backus-Naur Form (BNF)

$$op \rightarrow +\ |\ -\ |\ *\ |\ /$$

Context-free grammars in Backus-Naur Form (BNF)

$$op \rightarrow +$$ $$op \rightarrow -$$ $$op \rightarrow *$$ $$op \rightarrow /$$

Derivations and parse trees

$$expr \rightarrow id\ |\ number\ |\ -\ |\ expr\ op\ expr\ |\ (\ expr\ )$$ $$op \rightarrow +\ |\ -\ |\ *\ |\ /$$

$$expr \Rightarrow expr\ op\ expr\ \Rightarrow expr\ op\ id \Rightarrow expr\ +\ id$$ $$\Rightarrow expr\ op\ expr\ +\ id\ \Rightarrow expr\ op\ id\ +\ id\ \Rightarrow expr\ *\ id\ +\ id$$ $$\Rightarrow id\ *\ id\ +\ id$$

Derivations and parse trees

$$expr \Rightarrow^{\star} id\ *\ id\ +\ id$$

Derivations and parse trees

Derivations and parse trees

$$expr \rightarrow term\ |\ expr\ addop\ term$$ $$term \rightarrow factor\ |\ term\ multop\ factor$$ $$factor \rightarrow id\ |\ number\ |\ -\ factor\ |\ (\ expr\ )$$ $$addop \rightarrow\ +\ |\ -$$ $$multop \rightarrow\ *\ |\ /$$

Derivations and parse trees

Scanning

Scanning

Scanning and automata

$$(S,\Sigma,Move(),S0,Final)$$

where S is the set of all states in the NFA, Sigma is the set of input symbols, Move is the transition() function that maps a state/symbol pair to sets of states, S0 is the initial state, and Final is the set of accepting/final states. (See section 3.6 of the Dragon Book).

Scanning and automata

Scanning and automata

Generating a finite automaton

Parsing

Parsing

Parsing

Parsing

$$idlist \rightarrow id\ idlisttail$$ $$idlisttail \rightarrow ,\ id\ idlisttail$$ $$idlisttail \rightarrow ;$$

Parsing A, B, C; with previous grammar, top-down step 1

Parsing A, B, C; with previous grammar, top-down step 2

Parsing A, B, C; with previous grammar, top-down step 3

Parsing A, B, C; with previous grammar, top-down step 4

Bottom-up parsing

Bottom-up parsing

Bottom-up parsing

Bottom-up parsing

A better bottom-up grammar for lists

$$idlist \rightarrow idlistprefix\ ;$$ $$idlistprefix \rightarrow idlistprefix\ ,\ id$$ $$idlistprefix \rightarrow id$$

Step 1: Parsing "A, B, C" bottom-up

Step 2: Parsing "A, B, C" bottom-up

Step 3: Parsing "A, B, C" bottom-up

Step 4: Parsing "A, B, C" bottom-up

Recursive descent for the win

Figure 2.15: Example LL grammar for simple calculator language

$$program \rightarrow stmtlist\ EOD$$ $$stmtlist \rightarrow stmt\ stmtlist\ |\ \epsilon$$ $$stmt \rightarrow id\ :=\ expr\ |\ read\ id\ |\ write\ expr$$ $$expr \rightarrow term\ termtail$$ $$termtail \rightarrow addop\ term\ termtail\ |\ \epsilon$$

Figure 2.15 continued: Example LL grammar for simple calculator language

$$term \rightarrow factor\ factortail$$ $$factortail \rightarrow multop\ factor\ factortail\ |\ \epsilon$$ $$factor \rightarrow (\ expr\ )\ |\ id\ |\ number$$ $$addop \rightarrow +\ |\ -$$ $$multop \rightarrow *\ |\ /$$

Example 2.24 "Sum and average program"

 read A
 read B
 sum := A + B
 write sum
 write sum / 2

Figure 2.17 Recursive Descent Parse of example 2.24

Figure 2.17 Lemon Parse of example 2.24

Recursive Descent Parser for example 2.24 in C

parser.c

Lemon Parser for example 2.24

grammar.y

Recursive Descent (minimal)

Recursive Descent (fully general)

Disambiguation rules

$$stmt \rightarrow if\ condition\ thenclause\ elseclause\ | otherstmt$$ $$thenclause \rightarrow then\ stmt$$ $$elseclause \rightarrow then\ stmt\ |\ \epsilon$$

Disambiguation rules

Bottom-up parsing

Bottom-up parsing

Stack contents                    Remaining input
--------------------------------  ---------------
(nil)                             A, B, C;
id(A)                             , B, C;
id(A),                            B, C;
id(A), id(B)                      , C;
id(A), id(B),                     C;
id(A), id(B), id(C)               ;
id(A), id(B), id(C);
id(A), id(B), id(C) id_list_tail
id(A), id(B) id_list_tail
id(A), id_list_tail
id_list

The last four lines, the reduction ones, correspond to the derivation

$$idlist \Rightarrow id\ idlisttail \Rightarrow id,\ id\ idlisttail$$ $$\Rightarrow id,\ id,\ id\ idlisttail \Rightarrow id, id, id;$$