I wrote a toy parser (or rather a parser specification in the syntax of a parser-generator) for COOL, and I’m wondering if mature, popular languages actually use these things to generate their parsers.

The PMD project makes use of a (modified) copy of the JavaCC parser-generator definition for the Java grammar. Inspection of that file reveals the initial 1997 timestamp by author Sriram Sankar. According to Advances in Software Engineering, “The Java 1.1 grammar was developed by Sriram Sankar at Sun Microsystems and a copy of this grammar can be found in the distribution”¹. The syntax of JavaCC at a glance is similar to yacc, CUP, bison. In the PMD Java.jjt file, the familiar token declarations start around line 234, and the grammar starts at line 1122.

So it turns out that the parser for at least one popular, non-trivial grammar started (persists?) life in a .jjt file.

Update, 9 Sept 2013: Josh Haberman on LL and LR parsers:

Despite this vast body of theoretical knowledge, few of the parsers that are in production systems today make use of any of this theory. Many opt instead for hand-written parsers that are not based on any formalism. […] GCC moved away from their Bison-based parser to a handwritten recursive descent parser. The Ruby interpreter MRI may be one of the few remaining mainstream language implementations that does still use Bison (an LR-based tool) for parsing.

[…] pure LL and LR parsers have proven to be largely inadequate for real-world use cases. Many grammars that you’d naturally write for real-world use cases are not LL or LR, as we will see. The two most popular LL and LR-based parsing tools (ANTLR and Bison, respectively) both extend the pure LL and LR algorithms in various ways, adding features such as operator precedence, syntactic/semantic predicates, optional backtracking, and generalized parsing.

Update, 2023: These days we have tree-sitter and Lezer. Things are getting interesting.


  1. Hankan Erdogmus, Oryal Tanir. Advances in Software Engineering. p. 426. Springer, 2002