Making the identifier identification strawman less restrictive

Norbert Lindenberg ecmascript at lindenbergsoftware.com
Tue Oct 8 22:48:43 PDT 2013


On Oct 6, 2013, at 6:01 , Mathias Bynens <mathias at qiwi.be> wrote:

> This is about the identifier identification strawman: http://wiki.ecmascript.org/doku.php?id=strawman:identifier_identification
> 
> For tooling, it’s better to have a false positive than to have a false negative. In the case of identifier identification, it’s more useful to flag an identifier that is permitted as per the latest Unicode version as valid instead of rejecting it, even if it’s perhaps not supported in some engines that use data tables based on older Unicode versions.

I think that depends on the kind of tool you're writing:

- For a code transformation tool, such as CoffeeScript, I agree that you probably don't want to introduce any artificial restrictions, so you want to use the latest Unicode version possible. Step 10 of the proposed algorithm ("let unicode be the Unicode version supported by the implementation in ECMAScript identifiers") is intended to cover that case.

- For a code checker such as JSHint it's probably useful to be able to verify that code runs on all conforming implementations of a specific ECMAScript edition, and that's only guaranteed for the minimum Unicode version required by that edition. ECMAScript  5 implementations are not required to support Unicode 6.3.0, not even its BMP subset.

> In general, tools try to be lenient rather than restrictive in the input they accept. The list of ECMAScript 5 parsers that handle non-ASCII symbols in identifiers in the strawman backs this up: instead of using Unicode 3.0.0 data, more recent Unicode versions are used, in an attempt to handle as many technically valid identifiers as possible.

In the case of JSHint, I think that's problematic - see above.

> For these reasons, I’d suggest changing the identifier identification proposal as follows. Step 8 currently says:
> 
>> If `edition` is `3` or `5`, let `unicode` be `3.0`.
> 
> Change that into step 8a:
> 
>> If `edition` is `3`, let `unicode` be `3.0`.
> 
> Then, add a new step `8b`:
> 
>> If `edition` is `5`, let `unicode` be `6.3`.

That would create several problems:

- The Unicode version for ES 5 would be above that for ES 6 (step 9).

- Tools like JSHint, if they want to ensure compatibility with all ES 5 implementations, would have to lie and specify ES 3.

- Step 11 would allow all Unicode code points that are matched by the IdentifierStart production, including supplementary code points, which ES 5 does not permit in identifiers. (Note that Unicode 3.0, the version referenced by the ES 3 and ES 5 specs, was the last one that did not define any supplementary characters, so the spec as proposed doesn't have that problem).

- Implementations that don't support Unicode 6.3 yet, e.g., because they rely on Unicode information provided by the operating system, would not be able to comply with the spec.

- When the next version of Unicode is published, a spec referencing 6.3 would be obsolete just like one referencing 3.0.

Norbert


More information about the es-discuss mailing list