Specifying Syntax: Regular expressions and context-free grammars

Specifying Syntax: Regular expressions and context-free grammars

Tokens and regular expressions

Tokens and regular expressions

Tokens and regular expressions

$$number \rightarrow integer | real $$ $$integer \rightarrow digit\ digit* $$ $$real \rightarrow\ integer\ exponent\ |\ decimal\ (\ exponent\ |\ \epsilon\ )$$ $$decimal \rightarrow\ digit*\ (\ .\ digit\ |\ digit\ .\ )\ digit*$$ $$exponent \rightarrow\ (\ e\ |\ E\ )\ (\ +\ |\ -\ |\ \epsilon\ )\ integer$$ $$digit \rightarrow\ 1\ |\ 2\ |\ 3\ |\ 4\ |\ 5\ |\ 6\ |\ 7\ |\ 8\ |\ 9\ |\ 0$$

Tokens and regular expressions

Tokens and regular expressions

$$number \Rightarrow integer \Rightarrow digit\ digit*$$ $$\ \Rightarrow 1\ digit* \Rightarrow 1\ 3\ digit* \Rightarrow 1\ 3$$

Character sets and formatting issues

Character sets and formatting issues

Character sets and formatting issues

Character sets and formatting issues

Other applications of regular expressions

Other applications of regular expressions

"a4b6" =~ /a([0-9])b([0-9])/

Other applications of regular expressions

$ perl
"a4b6" =~ /a([0-9])b([0-9])/ ;
print $1 . "\n";
print $2 . "\n";
4
6

Context-free grammars in Backus-Naur Form (BNF)

Context-free grammars in Backus-Naur Form (BNF)

$$op \rightarrow +\ |\ -\ |\ *\ |\ /$$

Context-free grammars in Backus-Naur Form (BNF)

$$op \rightarrow +$$ $$op \rightarrow -$$ $$op \rightarrow *$$ $$op \rightarrow /$$

Derivations and parse trees

$$expr \rightarrow id\ |\ number\ |\ -\ |\ expr\ op\ expr\ |\ (\ expr\ )$$ $$op \rightarrow +\ |\ -\ |\ *\ |\ /$$

$$expr \Rightarrow expr\ op\ expr\ \Rightarrow expr\ op\ id \Rightarrow expr\ +\ id$$ $$\Rightarrow expr\ op\ expr\ +\ id\ \Rightarrow expr\ op\ id\ +\ id\ \Rightarrow expr\ *\ id\ +\ id$$ $$\Rightarrow id\ *\ id\ +\ id$$

Derivations and parse trees

$$expr \Rightarrow^{\star} id\ *\ id\ +\ id$$

Derivations and parse trees

Derivations and parse trees

$$expr \rightarrow term\ |\ expr\ addop\ term$$ $$term \rightarrow factor\ |\ term\ multop\ factor$$ $$factor \rightarrow id\ |\ number\ |\ -\ factor\ |\ (\ expr\ )$$ $$addop \rightarrow\ +\ |\ -$$ $$multop \rightarrow\ *\ |\ /$$

Derivations and parse trees

Scanning

Scanning

Scanning and automata

$$(S,\Sigma,Move(),S0,Final)$$

where S is the set of all states in the NFA, Sigma is the set of input symbols, Move is the transition() function that maps a state/symbol pair to sets of states, S0 is the initial state, and Final is the set of accepting/final states. (See section 3.6 of the Dragon Book).

Scanning and automata

Scanning and automata

Generating a finite automaton

Parsing

Parsing

Parsing

Parsing

$$idlist \rightarrow id\ idlisttail$$ $$idlisttail \rightarrow ,\ id\ idlisttail$$ $$idlisttail \rightarrow ;$$

Parsing A, B, C; with previous grammar, top-down step 1

Parsing A, B, C; with previous grammar, top-down step 2

Parsing A, B, C; with previous grammar, top-down step 3

Parsing A, B, C; with previous grammar, top-down step 4

Bottom-up parsing

Bottom-up parsing

Bottom-up parsing

Bottom-up parsing

A better bottom-up grammar for lists

$$idlist \rightarrow idlistprefix\ ;$$ $$idlistprefix \rightarrow idlistprefix\ ,\ id$$ $$idlistprefix \rightarrow id$$

Step 1: Parsing "A, B, C" bottom-up

Step 2: Parsing "A, B, C" bottom-up

Step 3: Parsing "A, B, C" bottom-up

Step 4: Parsing "A, B, C" bottom-up

Recursive descent for the win

Figure 2.15: Example LL grammar for simple calculator language

$$program \rightarrow stmtlist\ EOD$$ $$stmtlist \rightarrow stmt\ stmtlist\ |\ \epsilon$$ $$stmt \rightarrow id\ :=\ expr\ |\ read\ id\ |\ write\ expr$$ $$expr \rightarrow term\ termtail$$ $$termtail \rightarrow addop\ term\ termtail\ |\ \epsilon$$

Figure 2.15 continued: Example LL grammar for simple calculator language

$$term \rightarrow factor\ factortail$$ $$factortail \rightarrow multop\ factor\ factortail\ |\ \epsilon$$ $$factor \rightarrow (\ expr\ )\ |\ id\ |\ number$$ $$addop \rightarrow +\ |\ -$$ $$multop \rightarrow *\ |\ /$$

Example 2.24 "Sum and average program"

 read A
 read B
 sum := A + B
 write sum
 write sum / 2

Figure 2.17 Recursive Descent Parse of example 2.24

Figure 2.17 Lemon Parse of example 2.24

Recursive Descent Parser for example 2.24 in C

parser.c

Lemon Parser for example 2.24

grammar.y

Recursive Descent (minimal)

Recursive Descent (fully general)

Disambiguation rules

$$stmt \rightarrow if\ condition\ thenclause\ elseclause\ | otherstmt$$ $$thenclause \rightarrow then\ stmt$$ $$elseclause \rightarrow then\ stmt\ |\ \epsilon$$

Disambiguation rules

Bottom-up parsing

Bottom-up parsing

Stack contents                    Remaining input
--------------------------------  ---------------
(nil)                             A, B, C;
id(A)                             , B, C;
id(A),                            B, C;
id(A), id(B)                      , C;
id(A), id(B),                     C;
id(A), id(B), id(C)               ;
id(A), id(B), id(C);
id(A), id(B), id(C) id_list_tail
id(A), id(B) id_list_tail
id(A), id_list_tail
id_list

The last four lines, the reduction ones, correspond to the derivation

$$idlist \Rightarrow id\ idlisttail \Rightarrow id,\ id\ idlisttail$$ $$\Rightarrow id,\ id,\ id\ idlisttail \Rightarrow id, id, id;$$