9.12.10

Yacc is of the Living Dead

The scientific process seems to evolve towards something like pop culture and music industry. For starters, we have publishing houses and record companies.

And then, there is a scientific article with the title "Yacc is dead" [1] which, quite possibly, has reached more people than all the accepted papers of the conference to which it was submitted.

Russ Cox criticized this paper quite harshly in an insightful blog post [2].

So, in short, the paper proposes a method that yields all parse trees and claims that the approach is "efficiently on average". The authors run a benchmark, but of course a benchmark is not exactly a proof. Russ's argument (by regexp analogy) goes something like we can make up grammars that simulate NFAs. Since there are regexps whose NFA will have O(exp(n)) possible successful runs on input of length n, (maybe something like a*a*), getting all parse trees means asking for trouble.

Where I am not so sure is, whether the authors are really unaware of this. It is possible to understand "parses efficiently on average" as "works efficiently for the average grammar programmers want to parse". Writing an ambiguous grammar by accident is one thing, writing something like a*a* is something else. Agreed, a statement like "works well for the average grammar" does not really have much scientific worth -- and Russ's criticism still stands -- but indeed it may work out in practice.

The math that derives working parsers from their specification is beautiful... the issue for practical purposes is that there are are other approaches to combinator parsing -- these are the ones that should be compared with derivative-based method, not yacc.

There is at least one paper (journal publication) out there that shows how memoization can make top-down functional parsers linear [3?], unfortunately I cannot access it so cannot comment on how applicable it is here. What's next, DMCA for papers? <b>J'accuse</b>.

Regardless what is up with yacc or memoization, among functional programming literature, there is a plethora of papers that use parser combinators to achieve polynomial parsing (the first reference that came to my mind is Swierstra and Duponcheel's paper [4]), but the wikipedia page on combinator parsing [5] has a link to more recent -- 2007 -- work from Frost and Hafiz that looks interesting.

There are no conclusions here. Combinator parsing is not new, but with the rising popularity of Scala, it is pretty much alive and kicking. After all, it has a packrat combinator parsing library included. For my next parser, I will certainly use that one and not Might and Darais approach. Still, I think the approach described in Might and Darais paper has educational value and I sincerely hope the authors will come up with a longer, more polished version.

[1] Matthew Might, David Darais. Yacc is dead.

[2] Russ Cox. Yacc is not dead.

[3?] Frost, Szydlowski. Memoizing purely functional top-down backtracking language processors. Science of Computer Programming
Volume 27, Issue 3, November 1996, Pages 263-288

[4] Swierstra, Duponcheel. Deterministic, error correcting parser combinators.  

[5] http://en.wikipedia.org/wiki/Parser_combinator

19.8.10

Romantic enthusiasm for verification

Recently, a workshop got me interested in contract programming again. In essence, programming by contract means we annotate our source code with pre- and postconditions.

@Requires(numChars % 2 == 0)
@Ensures(result.length() == numChars)
String foo(int numChars) {
...
}


This is like saying "the only proper way to call this method with a numChars parameter that is even. You will get a nice non-empty string in return." Sounds like a deal.

There is probably a lot to say about contracts and how they can help developers understand their code. There are frameworks that will take these annotations and turn these into run-time checks. My interest is more in verification though. Automated program verification is really hard to do, because several undecidability results apply -- but in the words of Prof Jürgen Giesl this also means that one can always improve the state of the art.

The previous time I got excited by contracts was when I had the pleasure to host Prof Peter Müller at Google where he give a talk and demo of SpecSharp. The article Specification and Verification: The Spec# Experience gives a glimpse of the programming system the authors had built. An amusing quote from there: "When programmers see our demos, they often develop a romantic enthusiasm that does not correspond to verification reality."

Let me state that dream here: a program is made of method calls. If the compiler, or IDE, "understands" the pre-conditions and post-conditions of method calls, it should be able to chain these "facts" together and achieve some form of "understanding" of the program. This can help developers add or change lines and get immediate feedback on whether the source code they write is correct.

Going back to the example above: imagine writing foo(3) in your IDE and having it underlined in squiggly red, with a mouse-over message stating "precondition numChars % 2 == 0 not satisfied".

Spec# translates the pre- and postconditions into Satisfiability Modulo Theories (SMT) problems which are then solved by the underlying engine called Boogie. Another SMT solver is Z3 which has a readable tutorial. Reading up on these, it is quite striking that Spec# is working on a real programming language, and that neat interfaces do exist between the layers of byte code, verification source and the nitty gritty predicate logic formulae.

And then, there still is the following flaw, quoted here:

At the moment you find an error, your brain may disappear because
of the Heisenberg uncertainty principle, and be replaced by a new
brain that thinks the proof is correct.
L.A. Levin

Just kidding. Robert Pollack's article is concerned with the problem that we can come up with a system that will verify our program, but we won't necessarily be able to trust it, because it is a program itself. Here is the bit that is relevant though: "My guess is that tech- 
nologies of automated proof search will be highly developed because of economic 
pressure for reliable hardware and software, and these, applied as tactics to proof 
checking, will make formal mathematics a practical reality in the foreseeable fu- 
ture."

It seems we still have a long way to go before developers see the light in IDEs having some form of automated program verification. I have a hope that if one goes back to natural deduction style proofs, logical frameworks or simple logic programming with SLD resolution, it may be possible to write reasonable (logical) annotations and have proper support. Let (expert) programmers write more. Only experimentation can show whether this is a good idea.