arrow syntax unnecessary and the idea that "function" is too long
Brendan Eich
brendan at mozilla.com
Fri May 13 14:52:57 PDT 2011
On May 7, 2011, at 2:39 PM, Claus Reinke wrote:
>> Consider this: w = (x)->y || z
>> That code is not obvious at all. Which of these would it be?
>> 1: w = function (x) { return y } || z
>> 2: w = function (x) { return y || z }
>> It seems to me that there must be some sort of delineation around the function start and end.
Either that, or around the function body (expression or statement).
> But such delineation does not have to lead to required syntax
> in the function construct. Such required delineations were not
> part of lambda-calculus originally **, they arose out of implementation limitations that can be overcome.
We're talking JS syntax here, not anything like the original Lambda calculus or s-expressions.
> I know that takes some getting used to (been there myself*), but there is an established standard, both in publications on
> programming language theory, and in some real languages:
> function bodies extend as far as possible **
>
> So your examples would be
>
> 1: w = (function (x) return y) || z
> 2: w = (function (x) return y || z)
This does not work, because the ECMA-262 grammar simplifies to something like this toy bison grammar:
%token FUNCTION
%token NUMBER
%%
Expr: Expr '+' Term
| Term
;
Term: Term '*' Fact
| Fact '(' ')'
| Fact
;
Fact: FUNCTION '(' ')' '{' Expr '}'
| '(' Expr ')'
| NUMBER
;
Notice this grammar parses function expressions with empty parameter lists, and function calls with empty argument lists. No statements, only expressions, and only numbers with addition and multiplication (and numbers are syntactically callable, as in JS), but the essential grammatical structure remains.
Now let's try to extend it to allow an unbraced body for the analogue of ES5's PrimaryExpression, here called Fact:
%token FUNCTION
%token NUMBER
%%
Expr: Expr '+' Term
| Term
;
Term: Term '*' Fact
| Fact '(' ')'
| Fact
;
Fact: FUNCTION '(' ')' Fact
| '(' FUNCTION '(' ')' Expr ')'
| '(' Expr ')'
| NUMBER
;
That is, we allow an unparenthesized high-precedence expression (a Fact) as the body of a function expression, or else require parentheses around an entire function expression whose body is an unparenthesized low-precedence expression (an Expr).
The results in reduce/reduce conflicts:
state 23
4 Term: Fact . '(' ')'
5 | Fact .
6 Fact: FUNCTION '(' ')' Fact .
'(' shift, and go to state 13
'+' reduce using rule 5 (Term)
'+' [reduce using rule 6 (Fact)]
'*' reduce using rule 5 (Term)
'*' [reduce using rule 6 (Fact)]
'(' [reduce using rule 6 (Fact)]
')' reduce using rule 5 (Term)
')' [reduce using rule 6 (Fact)]
$default reduce using rule 5 (Term)
In other words, we've created a hopeless ambiguity between an unparenthesized function expression with a high-precedence body expression and a parenthesized-on-the-outside function expression with a low-precedence body.
There's a shift/reduce conflict too, but that is not the concern here. The problem is that no amount of lookahead can decide by which production to reduce. We must give up one of the two function expression forms.
One might be tempted to make an unparenthesized (on the outside) function expression with an unparenthesized expression body be a low-precedence expression (say, AssignmentExpression), but then it can't be used as the callee part of a CallExpression without being parenthesized. Such a low-precedence function expression would have to be parenthesized with every operator save comma.
In practice, function expressions aren't operated on except by being on the right-hand side of assignment, by calling them to make a CallExpression, and by concatenating their string conversions, but requiring parens for all sub-expression uses of function expressions is a big break from ECMA-262's grammar since ES3, which introduced function expressions as PrimaryExpressions (highest precedence).
Here is a more complete toy ES sub-grammar, still not too big:
%token FUNCTION
%token IDENT
%token NUMBER
%token PLUSPLUS
%%
Expr: Expr '+' Term
| Term
;
Term: Term '*' Fact
| Fact
;
Fact: PLUSPLUS Fact
| Post
;
Post: Call PLUSPLUS
| Call
;
Call: Call '(' ')'
| Prim
;
Prim: FUNCTION '(' ')' '{' Expr '}'
| '(' Expr ')'
| IDENT
| NUMBER
;
If we make the mistake ES4 made (implemented in SpiderMonkey still) of adding a Prim(aryExpression) right-hand side:
Prim: FUNCTION '(' ')' Expr
to allow unbraced expression bodies for function expressions, the low-precedence Expr at the end of high-precedence Prim makes a nasty loop in the grammar. Waldemar pointed this out a while ago (http://www.mail-archive.com/es-discuss@mozilla.org/msg01008.html). His counter-example used a "let" expression, but there's a function expression equivalent:
function () a ? f : x++();
Per the full ES grammar, and the toy just above, you cannot call a postfix-++ expression. But the whole of |function () a ? f : x++| should be a function expression with an unparenthesized Expr body, per the added right-hand side. And a function expression should be callable without having to be parenthesized, since it is a Prim(aryExpression).
This pretty much dooms arrow syntax to either requiring parentheses or braces around the arrow-function's body, always (one or the other; there's a separate issue with object initialisers Doug raised, I'll deal with it in a separate message); or else require parentheses or some other bracketing (as Isaac pointed out re: Ruby-ish blocks) on the outside. No way to be paren-free.
BTW, "maximal munch" refers to lexing, a DFA thing. Not to parsing regular languages, which requires a stack to decide how to recognize sentences in the language based on their prefixes and one or more tokens of lookahead.
Anyway, even though JS's grammar does not allow us to reason about function syntax as if we had the lambda calculus, I appreciated your attempt to deal with the cases Isaac came up with. I wish we could parse a low-precedence unparenthesized function body expression. It does not look feasible.
/be
More information about the es-discuss
mailing list