Unknown
This is mostly old work experimenting with different parser-generator technologies for dewy. The above demo is a C implementation of a Scannerless Right-Nulled Generalized Left-Right (SRNGLR) parser-generator. You can specify a grammar using my context-free grammar syntax, and provide an input string which will be parsed according to that grammar. You can then view parse forest and other various outputs from the parser in the results table. There are also several example grammars you can choose from to get started.
Some interesting features of SRNGLR parsers:
The main papers on on implementing SRNGLR parsers are Right-Nulled GLR Parsers and Faster Scannerless GLR Parsing
In addition to the C SRNGLR parser, I also worked on C and Python implementations of Clustered Nonterminal Parsing (CNP) and a Python implementation of Functional GLL Parsing
I am quite convinced that GLL parsing is the future, due to the ease of implementation, and the versatility afforded by generalized parsing techniques. It will be a killer feature in dewy when I finally am able to get it working. The main issue I had with all of these generalized parser generators is the data structure they produce—SRNGLR produces a parse forest, and CNP and GLL produce a Binary Subtree Representation (see: Clustered Nonterminal Parsing). Selecting from ambiguities and converting to the more standard Abstract Syntax Trees (AST) structure is non-trivial. Working on these had become a time sink, so for now, development on parser-generators has been put on hold in lieu of hand crafting a parser.
All programming languages start with a compiler, which itself is made of several parts, namely: lexing, parsing, semantic analysis, and code generation. Eventually, dewy will make use of some sort of scannerless generalized parser-generator (likely GLL) which allows the lexing and parsing phases be combined into a single step, as well as allowing for an extremely simple parser implementation (a particularly useful property for bootstrapping the language).
To parse a mathematical expression like
we first have to define the grammar for how raw text gets converted to a parse tree. For this, I've developed the Dewy Meta Language which allows for the specification of any Context Free Grammar (CFG), as well as some context sensitive grammars. Additionally, the meta-language / parser are optimized to allow for arbitrary unicode characters as part of the language alphabet, whereas parsers are often limited a much smaller alphabet, e.g. ASCII.1 + 2 * 3
Normally a grammar must be unambiguous to work with standard LR, LALR, etc. parsers. This complicates the process of writing the grammar, as often times, the natural way to express a language will be ambiguous, and require careful work to disambiguate. For the math expression
, or any other math expression, the unambiguous version of the grammar might look like this:1 + 2 * 3
Precedence is handled by restricting which expressions can be subexpressions of the different expressions in the grammar. Associativity is also handled in a similar fashion, namely the left or right hand side is restricted to specific subexpression types that generate the correct associativity. Ultimately though, because generalized parsers can handle ambiguities, the grammar can be simplified to something like this:
Note that for the ambiguous grammar, precedence and associativity still need to be handled at some point in the process. Generalized parsers just provide the flexibility to handle them later in the parsing process, when it is much more convenient.
Running the first grammar on
we get the following parse tree1 + 2 * 3
while parsing the same input with the ambiguous grammar returns the following parse forest:
Notice the ambiguity nodes for
on lines 4 and 35, representing the two options for parsing the expression, namely #E
vs (1 + 2) * 3
respectively1 + (2 * 3)
Each of the parser implementations is on its own branch of the dewy github repo:
Clone the repo and checkout the branch you want to see:
git clone git@github.com:david-andrew/dewy-lang.git
cd dewy-lang
git checkout C_SRNGLR_Parser #or C_Clustered_Nonterminal_Parser or Python_GLL_Parser
cd src
The two C parsers can be built and run with:
make dewy
./dewy path/to/grammar/file path/to/source/file
The project includes several test grammar/source file pairs in the
directory. e.g. the simple expression grammars from above could be run like so:dewy-lang/tests/
./dewy ../tests/8.grammar ../tests/8.source #ambiguous version
./dewy ../tests/3.grammar ../tests/3.source #unambiguous version
The Python parser (which only operates on internal data) can be run directly with:
python fungll.py