ES parsing tools (Re: Short Functions)

Claus Reinke claus.reinke at talk21.com
Sun May 29 02:11:35 PDT 2011


tl;dr:
    - JS-based PEG + ANTLR as a route for ES grammar experiments
    - would like to know about route viability and alternative routes

>> If OMeta's should really be slower than similar parsers, and
>> the grammar-optimizing side is covered by the authors,
>> perhaps there is room for old-fashioned JS code optimization?
>> For instance, OMeta/JS is not the main line of OMeta code,
>> and code that is idiomatic in another language might be
>> less than optimal if moved to JS.
>
> Does this line of inquiry do anything for (a) the spec grammar
> needing validation (no ambiguities); (b) implementor acceptability?

As indicated by the change of subject, this is a separate topic,
triggered by remarks made by several posters on the previous
topic. Having good tools supporting practice testing of proposals
is important to proposal quality, and several remarks suggested
a tooling problem in the experimental ES extension parsing area.

Since committee members have invested in Ometa usage,
knowing whether or not their work can be sped up without a
complete reimplementation is relevant.

> I don't see how (a) is served. Please correct me if I'm missing something.

To repeat my previous suggestion for (a): have a look at
ANTLRWorks[0]. While I have not tried it myself yet, it claims
to be a "sophisticated grammar development environment"
(longer excerpt from blurb included below).

The relation to the current topic is that ANTLR's LL(*) parsing [1]
aims to generalize and optimize PEGs and related formalisms
typical for top-down parsers (while GLR and bison are typical for
bottom-up parsers, which are not relevant for ES implementations,
according to your information).

PEGs without left-recursion can apparently be translated to ANTLR
grammars, and PEGs with limited lookahead can be implemented
by Antlr with close to LL(k) efficiency. So starting with a JS-based
PEG tool for in-browser experimentation, then working to limit
backtracking (using finite or regular lookahead for committed
choice) and keeping ANTLRWorks in mind as an analysis
framework sounds viable. It is the route I have chosen, so far.

(Antlr even has backends in various languages, though I doubt
that existing ES engine implementations will want to reengineer
their frontends on top of Antlr-generated parsers)

If there are other or better options for ES grammar experiments,
I hope this thread will bring them out. There are certainly other
"language workbenches" (spoofax and mps come to mind), but
do they offer a JS-based starting point, like the PEG route? Or
grammar debugging support?

The related work section of a paper on "Language Modularization
and Composition" [2] lists some alternatives, but recent ones
tend to be projectional (source is a view, projected from ast as
a model) rather than parser-based (ast is extracted from source).

> The answer to (b) is "no". Implementors use tuned hand-crafted
> top-down parsers, not Ometa or anything like it.

The postings which triggered my inquiry were authored by
coders not implementing the major ES engines, but by ES
implementers nevertheless. Some wanted to be able to play
with proposed ES extensions, to get a feel for existing proposals,
or to prepare more realistic suggestions of their own, others
have a working implementation of an ES security extension,
but expressed concern about the efficiency of their solution.
My own interests lie in providing tools for JS, and implementable
suggestions for ES.

Your own reaction suggested that you consider Ometa too
slow, and that you test your own extensions on grammar
subsets in bison, adding only the most promising ones to
full, handcrafted JS implementations. The overall impression
was that there is no viable option for practical full ES grammar
experiments, short of extending a full JS implementation, and
that the latter do not offer a validation route.

I want to express a counterpoint to those remarks, but I
need some input first: not having practical experience with
Ometa, I cannot say whether it is slower than one would
expect from the publications. If it is slower than the reference
I provided, based on experiments with other JS-based parser
tools, then it is likely that one can optimize the code or switch
to a more efficient parsing tool, without having to reimplement
the secure ES implementation or study parsing techniques.

If Ometa already is as fast as similar parsing tools, just too
slow for the secure ES project, I can still attest that such tools
are fast enough for smaller experiments with ES extensions.

Being based on explicit representations of the ES grammar,
they are also easier to understand and change than hand-
optimized parsers and the grammars can often be translated
to the input forms expected by other tools (for verification,
performance, IDE support, or documentation/spec purposes).

>> Btw, it would be nice to have an es-tools-discuss list, where
>> one could discuss ES implementation and tooling issues
>> (preferably with the engine implementers listening in).
>> Does such a list exist?
>
> No, we're not going to split this list. Volume is up lately but
> nowhere near fatally high, and topics interconnect.
>
> Researching JS parsing is probably better done in a research
> setting. Talking about it here in open-ended ways doesn't
> help make near-term progress in the standardized language.
> Having a more closed or "done" solution that addresses (a)
> and (b) above would help, though.

The problem is not with research, it is with connecting research
to the ES design process. An open-ended discussion list for
theory, tools, and problems relevant to this process might attract
the researchers who have been working on ES semantics and
analysis, but currently remain silent or absent on this list.

Keeping the open-ended list separate from the spec-focused
list would provide a specialized venue for recording ES-specific
problems (which interested researchers might want to work on)
and solutions (the suitability of which researchers might want
to demonstrate).

To be interesting to researchers, such a venue has to be close
to the action -es-discuss-, without being limited by the timeline
and tl;dr constraints of es-discuss,  but with a demonstrable
chance of discussions having practical relevance and impact
(cross-pollination with es-discuss and language committee).

Claus

[0] http://www.antlr.org/works

from the blurb:
    It combines an excellent grammar-aware editor with an
    interpreter for rapid prototyping and a language-agnostic
    debugger for isolating grammar errors. ANTLRWorks helps
    eliminate grammar nondeterminisms, one of the most
    difficult problems for beginners and experts alike, by
    highlighting nondeterministic paths in the syntax diagram
    associated with a grammar. ANTLRWorks' goal is to make
    grammars more accessible to the average programmer,
    improve maintainability and readability of grammars by
    providing excellent grammar navigation and refactoring
    tools, and address the most common questions and
    problems encountered by grammar developers:
    - Why is this grammar fragment nondeterministic?
    - Does this rule match a sample input?
    - Why is this grammar improperly matching this complete input?
    - Why is there a syntax error given this input?
    - Why is there no syntax error given this ungrammatical input?

[1] "LL(*): The Foundation of the ANTLR Parser Generator"
    Terence Parr and Kathleen Fisher Sat Feb 5, 2011 13:07
    [This is a draft of paper accepted to PLDI 2011]
    http://www.antlr.org/papers/LL-star-PLDI11.pdf

[2] "Language Modularization and Composition with
    Projectional Language Workbenches illustrated with MPS"
    Markus Voelter, Konstantin Solomatov
    http://voelter.de/data/pub/VoelterSolomatov_SLE2010_LanguageModularizationAndCompositionLWBs.pdf
    (see section "4 Related Tools and Approaches")

> I'm not against research (Mozilla funds research), and we do
> need better methods over time. But developing them is
> specialized work best done in more specialized venues,
> without the more practical constraints that JS developers,
> interested experts, and TC39 members face in working on
> "ES" and discussing it.
>
> And if we can avoid research by using existing formalisms and
> tools, then we should. That's strictly better for standardization.
>
> /be

 



More information about the es-discuss mailing list