Question about allowed characters in identifier names

Norbert Lindenberg ecmascript at lindenbergsoftware.com
Sat Aug 24 19:17:16 PDT 2013


On Aug 24, 2013, at 5:42 , Mathias Bynens <mathias at qiwi.be> wrote:

> To clarify: consider what the Identifier Identification strawman[1] or any scripts that emulate similar behavior should do if Allen’s suggestion would be implemented:
> 
>    String.isIdentifierStart('\uD87E\uDC00'); // should be `false`
>    String.isIdentifierStart('\u{2F800}'); // should be `true`
>    // this is impossible, since `'\uD87E\uDC00' === '\u{2F800}'` and there is no way to distinguish these strings
> 
> [1] http://wiki.ecmascript.org/doku.php?id=strawman:identifier_identification

On Aug 24, 2013, at 14:19 , Mathias Bynens <mathias at qiwi.be> wrote:

> I just want to make sure it’s possible to write a polyfill (in ES5) for the `String.isIdentifier{Start,Part}` strawman. As long as `String.isIdentifierStart('\uD87E\uDC00')` and `String.isIdentifierStart('\u{2F800}')` are expected to return different results (as Allen suggests), this is impossible.

Allen didn't discuss these functions - the strawman didn't exist during the previous round of this discussion. Your code uses string literals, and in ES6 string literals '\uD87E\uDC00' === '\u{2F800}'. This means the functions proposed in my Identifier Identification strawman cannot tell the difference, but then the specification doesn't require them to.

What Allen suggested, and the current ES6 spec says, is that identifiers in source text using different Unicode escape forms behave differently: 
   var \uD87E\uDC00;
throws an exception, while
   var \u{2F800};
declares a variable.

I don't think that's a technical problem. String.isIdentifier{Start,Part}, as I proposed them, don't deal with actual identifiers in source text; they check individual identifier characters. The functions are intended to be called by a parser, and it's up to the parser to deal with escaping rules, throwing exceptions or unescaping as specified before passing code points to String.isIdentifier{Start,Part}. Calling the functions with string literals doesn't seem like a useful use case.

I do think it's a problem in learning and understanding the language. Having different rules for \uD87E\uDC00 in string literals and identifiers, and therefore also for identifiers embedded in strings passed to eval(), adds yet another of those random inconsistencies that already litter ECMAScript, and ensures a "wat" moment for everybody who comes across them.

On a side note, the strawman hasn't been discussed by TC39 and hasn't been accepted for either ES6 or ES7, so it may be a bit premature to polyfill it. Informal feedback from some members indicated that they'd rather discuss it in the context of a complete proposal for Unicode character properties support.

Norbert


More information about the es-discuss mailing list