Home Comparison Feedback Papers About Contact

LRSTAR Parser Generator

  • Creates CLR(1), LR(1), LR(k) parsers in
  • C++
  • Creates parser tables of type
  • Matrix
  • Reads grammar notation with EBNF operators (+,*,?)
  • yes
  • Generates C++ code from a skeleton file
  • yes
  • Parser speed (see note)
  • 1,500,000  lines/second
  • Parsers have a symbol-table builder
  • yes
  • Parsers can do AST construction and traversal
  • yes
  • Parsers have automatic error recovery
  • NO
  • Includes 6 Microsoft Visual C++ projects
  • yes
  • Includes 20 BNF grammars
  • yes

    DFASTAR Lexer Generator

  • Creates DFA lexical analyzers in
  • C++
  • Creates lexer tables of type
  • Matrix, Direct Code
  • Lexer speed (2 times the speed of flex)
  • 30,000,000  tokens/second
  • Lexers do keyword recognition
  • yes
  • Lexers handle Unicode characters
  • NO

    LRSTAR + DFASTAR v 6.4.005 (open-source project)


    CLR(1) Parsers

    You may use the option "clr", to request a Canonical LR(1) parser.  CLR(1) parsers were invented by Donald Knuth in 1965.  They are the easiest type to understand and work with.  However, CLR(1) parsers are much larger than LALR(1) parsers. 

    LR(1) Parsers

    You may use the option "lr", to request a Minimal LR(1) parser.  These are almost as small as LALR(1) parsers, but handle a larger class of grammars.

    LR(k) Parsers

    If using the option "nd=1" or "nd=2", LRSTAR will create a nondeterministic LR(k) parser if your grammar requires it.  An NDLR(k) parser uses 'k' symbols of lookahead (at parsing time) to resolve ambiguities in the grammar.  LR(k) parsers are slightly larger than Minimal LR(1) parsers because there are typically only a few states that require LR(k) parsing (for k > 1).  In those states, the LR(k) parser will try all valid symbols for that state and see how far ahead the parser will go.  Then it will choose the symbol which allows parsing to advance the farthest without error.  'k' is set to a fixed number (usually 2 or 3) by the user in the parser source code. 

    Matrix Tables

    Matrix tables provide the fastest table-driven parsers and lexers.  LRSTAR uses compressed-matrix tables, which offer small tables but still very fast.  Matrix tables compile faster than recursive-descent parsers and direct-code lexers.  Testing has shown that matrix-driven lexers are as fast as direct-code lexers, but are smaller and compile faster.  Fast parsing research shows that recursive-descent parsers can be faster than table-driven parsers, however they have disadvantages.

    Direct Code Lexers

    Direct-code lexers are the fastest, if you need a line counter, otherwise the large table-driven lexers are the fastest.  Direct-code lexers are a reasonable size, whereas the large lexers may be extremely large.  The only disadvantage about direct-code lexers is that you may have a long compile time.  See the bar charts on the comparison page.

    Parser Speed

    Syntax-checking speed is 1,500,000 lines per second.  Syntax-checking speed was measured by reading an input file of C source code of size 2200 lines.  The input file was read 1000 times: opening the file, doing lexical analysis, parsing and building a symbol table (but not an AST).  No output files were created.  No preprocessing was done.  The testing computer is a 3 GHz desktop single-CPU machine.  Note: when building an abstract-syntax tree (AST) the speed is 800,000 lines per second.  A compiler front-end might be able to process 500,000 lines per second at best.

    DFA Lexical Analyzers

    A DFA lexer is a finite-state automata without a pushdown stack (i.e. not a PDA).  DFA is the fastest recognition algorithm (5 times the speed of PDA's).  A DFA algorithm does not use any lookahead characters.  Because it is not a PDA, it cannot handle "nested" comments or any type of "nested" input (e.g. [[a]]).  However, DFA's work fine for most programming languages.  Afterall, the job of handling "nested" types of input usually belongs to the parser.  Before DFA's became available, lexical analysis had been known to consume a large percentage of the processing time in compiler front-ends.  With DFA's that time is reduced to about 10-20% of the compiler front-end.

    Skeleton File

    A skeleton file contains the source code which you want the generator to output.  It includes special notation indicating what code to generate and how to format the parser tables or lexer tables.  This skeleton-file concept makes it feasible to output source code in other programming languages, however C and C++ are better suited to the task of parser generation.  At least, you have the option to modify the parser skeleton code to suit your preferred style of coding.

    (c) Copyright LRSTAR.ORG 2014.  All rights reserved.