ES4 implementation process, teams, and meetings

Graydon Hoare graydon at
Thu Feb 21 19:24:38 PST 2008

Maciej Stachowiak wrote:

> We're unlikely to have much interest in working on implementing the RI. 

Ok. I'm sorry to hear that, but I understand.

> As for reading the RI, it seems a lot harder to understand than specs 
> written in prose. As far as I can tell, only people who have coded 
> significant portions understand it.

Fair enough. Comprehensibility is a good part of the measurable value of 
a spec, so if it the code prohibits that we are in an undesirable state.

I wonder -- I do not mean to offend here -- if this is partly "sticker 
shock" at the initial barrier, which is simply that you have to digest 
SML, and you haven't read it before. It is not a terribly hard language 
to learn: it consists of value bindings, function expressions, 
function-application expressions, case expressions with destructuring 
pattern matching, if/then/else expressions, and a very small algebraic 
type system (function types, named types, records, disjoint sums, and 
sugar for lists).

I also intended to do -- and have gradually been doing -- a conversion 
to the simplest possible syntactic forms of SML I could write, unpacking 
any syntactic short-hands or dense, idiomatic phrases that might have 
turned up during the more intense implementation stages. This is similar 
to rewriting english paragraphs for clarity, and is easy work for 
someone who speaks the language. Parallelizes easily. I believe this 
will help with the legibility significantly: for example, read 
evalCondExpr and tell me if it's illegible:

     evalCondExpr (regs:Mach.REGS)
     : Mach.VAL =
         val v = evalExpr regs cond
         val b = toBoolean v
         if b
         then evalExpr regs thn
         else evalExpr regs els

One of my goals -- which I have surely not achieved yet -- is for most 
of the RI to be distilled to this very pedestrian dialect.

> On the one hand, it's useful to have a reference implementation to 
> validate what is being done, explore ideas, have something to test and 
> compare against, etc.
> But yes I think it is an incredibly bad idea for the only specification 
> to be a computer program. It's not approachable. I don't think I could 
> quickly grok a program of this complexity even in a programming language 
> I am familiar with. And by its nature it does not very cleanly partition 
> separate concepts. For example, below you pointed me to 6 places in the 
> code for "let" statements, and I doubt that reading those functions 
> alone will be enough to understand it. So in practice, I don't think 
> there is any way to understand "let" in detail without asking you or 
> another expert on the RI.

These are ... points I nearly agree with, but not quite, and at the risk 
of being terribly long-winded I'd like to air the discussion a bit in 
public here, if we can back off from worrying that I'm saying anything 
about the schedule of auxiliary-doc-generation (which I've hopefully 
addressed in the other email):

First I want to to point out that there is no established "right way" to 
publish language specifications. Language specifications range in style 
and formalisms employed. People frequently need to study spec, and 
implementations, and formal treatments in eg. proof assistants or 
reduced semantic models, *and* do impl-to-impl compatibility bakeoffs. 
And it still sometimes takes many years, many revisions, to nail down 
what people actually agree on or disagree on, what's "in" the language 
or "out" of it. Sometimes it takes 5 or 10 years to discover a horrible 
unsoundness in the language (or, gasp, that you accidentally made the 
type system turing complete!)

No one approach is proven to "work". Not yet.

The AS3 draft spec we were looking at two winters ago had sections 
containing pseudo-C++ code, as a way of describing relevant data 
structures. ES3 has pseudo-assembly, that has typos and nonsensical 
parts in addition to requiring readers to execute goto statements in 
their head to understand the flow of a rule. R6RS, for a different 
example, shipped most recently with a PLT Redex operational semantic 
model to accompany and illuminate it. We considered using PLT Redex too, 
and in fact rejected it in part out of the belief (perhaps mistaken!) 
that "normal" programmers would find a "normal" language like SML easier 
to read than one from the more academic setting of operational semantics 
descriptions. Possibly in the future (as the POPLMark challenge is 
hoping to establish) a standard metatheory will solidify for semantics 
such that machine-checked evaluation rules are no less common than 
machine-checked grammars in EBNF. But we're not there yet, so we picked 
something that seemed like it might help, and in at least some senses 
(see next point) it did.

Second I want to point out that while much of the value of a spec is in 
informing/transmitting information from designers to implementors, a 
fair portion of the value is also in agreeing/deciding what the various 
spec-stakeholders wish, and mean, in their own minds and their own 
efforts. I am certain, from recollection, that one of our motivations in 
pursuing an RI-focused strategy at all was the fear that we were 
producing incoherent ideas: that we all had ideas of what we'd like, but 
writing them in english side-by-side (or arguing them across a table) 
simply didn't force all the horrifying details of their semantic 
incompatibility to manifest. Even if the RI turns out to be a throw-away 
artifact -- not useful for the "informing" role of a spec -- I believe 
it has helped quite a bit in crystallizing ideas and helping us tinker 
toward agreements.

Third I should make clear that IIRC nobody on the committee ever 
articulated a belief that the SML would be the "only specification" of 
ES4. If anyone did it would have been me, and even I'm not *that* 
deluded. We have entertained the notion -- I'll admit to promoting it -- 
that excerpts of the SML, or some machine-translations of those 
excerpts, may wind up constituting part of the normative text, since the 
"by hand" expansion of many evaluation rules reads a *lot* like the 
machine translation of the simplest SML form. See evalCondExpr above as 

The jury is still out on whether that may occur -- some standards bodies 
apparently dislike the smell of it, which I find remarkable considering 
how many other formalisms (box diagrams, equations, grammars) smell just 
fine -- but we all know that no matter what happens to the SML there 
will need to be plenty of less-specific accompanying narrative written 
at some point. What you and I are discussing now is whether you can 
*presently* (rather than "at some point") extract enough of the meaning 
you require for early-implementation work from the SML. If you can't, we 
need to move up the schedule on some of the accompanying narrative. Fair 
enough. Maybe mbedthis got lucky, or have a higher pain tolerance :)

Finally, I think it is unfair to complain that there are 6 places "let" 
affects. Programming languages are, as you know, highly integrated and 
inter-related affairs. Show me a language spec in *any* formalism that 
can get away with treating "one feature" (that is not syntactic sugar) 
only once, and never discussing it again. Not likely.

> I asked someone who knows SML to look at it and he found the code pretty 
> opaque as well, perhaps due in part to the very terse variable names and 
> occasionally obscure concepts. (I still don't understand what a "runtime 
> type rib" is, and searching for other references in the file does not 
> elucidate, so I'm not sure reading backwards would help.)

I might as well mention what this means in passing, though I surely get 
your meaning by now. Extra guidance and terminology can't hurt.

Ribs are lists of (fixture name * fixture) pairs. A rib represents the 
set of fixtures that we know *will be* present in a runtime structure, 
any time we build it. The instance rib of a class describes its fixed 
instance variables. The rib of a function or block describes its fixed 
activation variables. The definition phase of the RI lowers everything 
"slot-ish" to sets of fixtures arranged into Ast.RIBs, just as the 
machine model treats everything "slot-ish" as name->property maps 

The type system -- in fact the entire AST -- is immutable and discusses 
only the plan of execution, not the runtime artifacts. So if you want to 
call a type function, you need to provide things like RIBs not things 
like PROP_BINDINGS. Type rules wouldn't know what to do with runtime 
artifacts like the latter, even if the type rules were being called 
*from* runtime. They're *defined over* immutable compile time artifacts.

So the type normalizer -- that part of the type system that converts 
type names to type definitions, applies parametric types, and shuffles 
structural types around to a normal form -- requires a set of RIBs to do 
anything with.

The function you were looking is a support function for the runtime 
invocation of the type normalizer. The normalizer can be and is invoked 
at compile time too, but this is what happens when you invoke it at 
runtime: at takes a runtime scope chain and extracts the type-relevant 
ribs the scopes were built from, in order to reconstruct the appropriate 
environment for the normalizer.

(This particular aspect of normalization is not required unless you 
implement parametric types, in which type environments can be captured 
and moved around)

> "let" was meant to be an illustration that it's prohibitively difficult 
> to get the needed info without having inside knowledge. If I were to 
> generalize this approach to help someone understand another ES4 feature, 
> I would probably just say "ask Graydon". Is it truly acceptable to have 
> a spec where that's the easiest way to understand it?

Of course not, and insofar as we may have reached that point (I hope we 
have not) it would be unsatisfactory to me as well.


More information about the Es4-discuss mailing list