| Parsing as Algebra | ||||||||||||||||||||||||||||||||||
| This study
connects two areas not generally associated in CS -
parsing and algebra. I will show that parsing in context
free grammars can be viewed as a species of algebra
concerned with simple laws of addition. So where to begin? Parsing or algebra? Lets begin with algebra, and I emphasise that the algebra we are looking at is very simple and relates to basic properties of addition and equality. The first rule needs no justification. 1. For all x, x = x. The second is almost as simple; it enables us to cancel common terms on each side of an equation. 2. The equality x + y = x + z may be proved if y = z can be proved. The third pair reflects the associativity of addition 3a. (x + y) + z = w may be
proved if x + (y + z) = w can be proved The fourth rule allows us to substitute equals for equals 4. Given x = y, then x + z = w can be proved if y + z = w can be proved. The final rule allows us to add zero to each side of an equality. 5. x = y can be proved if x + 0 = y + 0 can be proved. and that's it. Now what is the significance of these rules? Actually the significance is that when suitably animated (my jargon, meaning, applied in a particular control order using the computer), they form a model for a top-down recursive descent parser. One oddity about this, is that it is not obvious at first sight, but becomes obvious after an example. Doing a Proof (or is it Parsing?) OK, how does this work? The parsing problem has to be placed in a canonical form of the following kind. 1. Every grammar rule used in
the parse of the form x --> y z has to be expressed as an equality x
= (y + z). These equalities are all assumptions in
the proof. Example: Here are a bunch of grammar rules used as assumptions 1. s = (np + vp) We want to prove "the boy
jumps" is grammatical which means that we want to
prove the equality Theorem: s =
("the" + ("boy" + "jumps"))
* This step shows why we need rule 5 at the beginning, without the zero the step is not formally correct A Proof Procedure There is a proof procedure for solving these problems which corresponds to top down recursive descent parsing. 1. Apply rule 5 at the beginning
only. Doing It in Prolog Generally I'm a fan of functional programming, but this procedure lends itself to a Prolog formulation that is a little gem of 8 lines. \********************************************************************************\ \A 8 line algebra ATP in Qi Prolog (or is it really a recursive descent parser?). Users of regular Prolog need to put in those tedious commas into the lists.\ (m-prolog " parse([s = S], A) :- parsing([[s + 0] = [S + 0] ], A). parsing([X = X],_). member(X,[X | _]). \***********************************************************************************\ And here is an example of it working. (8-) (set *grammar* [ (9-) (prolog? (parse [s =
["the" + ["boy" +
"jumps"]]] (value *grammar*))) Conclusions There is something rather satisfying about seeing the unity of two apparently different processes, and I found the above rather pleasing, like a strange pebble picked up from the beach that when polished shows up a beautiful colour. However non-utilitarian the insight, it has always given me a strange pleasure to see how the mental gymnastics I learnt as a boy can find a purchase in formal language recognition. Collecting these pebbles can become a passion. Mark Copyright (c) 2009, Mark Tarver |