this vs thi\u0073

Mike Samuel mikesamuel at gmail.com
Tue Jun 21 11:10:31 PDT 2011


2011/6/21 Allen Wirfs-Brock <allen at wirfs-brock.com>:
> ES5.1:
>
> 7.6.1   Reserved Words
>
> A reserved word is an IdentifierName that cannot be used as an Identifier.
>
> 7.6    Identifier Names and Identifiers
>
> Identifier Names are tokens that are interpreted according to the grammar
> given in the “Identifiers” section of chapter 5 of the Unicode standard,
> with some small modifications. An Identifier is an IdentifierName that is
> not a ReservedWord (see 7.6.1). The Unicode identifier grammar is based on
> both normative and informative character categories specified by the Unicode
> Standard. The characters in the specified categories in version 3.0 of the
> Unicode standard must be treated as in those categories by all conforming
> ECMAScript implementations.
>
> This standard specifies specific character additions: The dollar sign ($)
> and the underscore (_) are permitted anywhere in an IdentifierName.
>
> Unicode escape sequences are also permitted in an IdentifierName, where they
> contribute a single character to the IdentifierName, as computed by the CV
> of the UnicodeEscapeSequence (see 7.8.4). The \ preceding the
> UnicodeEscapeSequence does not contribute a character to the IdentifierName.
> A UnicodeEscapeSequence cannot be used to put a character into an
> IdentifierName that would otherwise be illegal. In other words, if a \
> UnicodeEscapeSequence sequence were replaced by its UnicodeEscapeSequence's
> CV, the result must still be a valid IdentifierName that has the exact same
> sequence of characters as the original IdentifierName. All interpretations
> of identifiers within this specification are based upon their actual
> characters regardless of whether or not an escape sequence was used to
> contribute any particular characters.
>
> The red text would seems to say that \u0069f and if are the same reserved
> word.
> This may not match implementations but it is what the spec. says.
> ES3 didn't distinguish IdentifierName from Identifier but from a quick scan
> of the ES3 language I don't see that the spec. is any different in this
> regard.
> Also, given the pervasive  substitution of Unicode escape sequences I don't
> see why they shouldn't be legal in reserved words.

"An Identifier is an IdentifierName that is not a ReservedWord (see
7.6.1)" seems to be a non-normative reference to the normative 7.6
production (the "see 7.6.1" is just a reference to the definition of
ReservedWord)

    Identifier :: IdentifierName (but not ReservedWord)

Your interpretation assumes that the "but not ReservedWord" language
in the Identifier production applies *after* the identifier has been
decoded but that is not at all clear to me.
>From the section you quoted, "Identifier Names are tokens" i.e.
sequences of SourceCharacters, so the token "\u0069f" is clearly
distinct from the token "if".

Since the normative "but nor ReservedWord" appears in a lexical token
grammar it should apply at the token level before any interpretation
happens.

Assuming your interpretation is correct though,
   \u0069f
may be an IdentifierName corresponding to a reserved keyword, but
since IfStatement is defined in terms of

    IfStatement : if ( Expression ) Statement else Statement
         if ( Expression ) Statement

where "if" appears literally instead of any reference to an
IdentifierName whose decoded value is "if", would you agree that

   \u0069f(false)
   alert(1);

is not a valid EcmaScript program.  It should definitely not be
interpreted as an EcmaScript program containing an IfStatement.


In that case, we still have at least 3 (haven't tested IE) of 4 major
browsers agreeing that the illegal EcmaScript program

    this.\u0069\u0066 = function () { alert("called \u0069\u0066"); };

    \u0069\u0066(false)
    alert(1);

should be interpreted as a call via the reference "if" followed by a
call via the reference "alert".


More information about the es-discuss mailing list