invalid escape sequences

Dave Fugate dfugate at microsoft.com
Wed Jun 1 09:10:49 PDT 2011


Results for IE9 ("IE9 standards" mode) given the snippet below:
	"\r" : "ERROR"
	"\\u" : "ERROR"
	"\\x" : "ERROR"
	"\\8" : "8"
	"\\28" : "\u00028"
	"\\228" : "\u00128"
	"\\3778" : "ÿ8"
	"\\478" : "'8"
	"\\778" : "?8"

My best,

Dave

-----Original Message-----
From: es-discuss-bounces at mozilla.org [mailto:es-discuss-bounces at mozilla.org] On Behalf Of Mike Samuel
Sent: Tuesday, May 31, 2011 6:34 PM
To: es-discuss
Subject: invalid escape sequences

During the last meeting, the semantics of "\z" came up.  Specifically, what does \ followed by a character not in the set with a specified escape expand to?

From 7.8.4 StringLiteral

    "
    EscapeSequence :: CharacterEscapeSequence
    "

leads to

    "
    CharacterEscapeSequence :: ...
        NonEscapeCharacter

    NonEscapeCharacter :: SourceCharacter but not one of EscapeCharacter or LineTerminator
    "

and the semantics of NonEscapeCharacter is given thus

    "
    The CV of CharacterEscapeSequence :: NonEscapeCharacter is the CV of the NonEscapeCharacter.
    "

so are the following assertions true?

(1)

The only SourceCharacter sequences that do not match ( DoubleStringCharacter | SingleStringCharacter ) applied one or more times are a LineTerminator not preceded by an odd number of backslashes, "u" not followed by 4 valid hex digits and not preceded by an even number of backslashes, "x" not followed by 2 valid hex digits and not preceded by an even number of backslashes, or a decimal digit not preceded by an even number of backslashes.
I.e. /(?:^|[^\\])(?:\\\\)*([\r\n\u2028\u2029]|\\u(?![0-9A-Fa-f]{4})|\\x(?![0-9A-Fa-f]{2})|\\[0-9]/
tests whether a sequence of SourceCharacters matches zero or more ( DoubleStringCharacter | SingleStringCharacter ).

(2)

The B.1.2 additional octal syntax, quoted below, does change the validity of the test above.
    "
    OctalEscapeSequence :: OctalDigit [lookahead not in DecimalDigit]
        ZeroToThree OctalDigit [lookahead not in DecimalDigit]
        FourToSeven OctalDigit
        ZeroToThree OctalDigit OctalDigit
    "

NonEscapeCharacter excludes DecimalDigit through SingleEscapeCharacter but OctalEscape allows [0-7].  So under B.1.2, /(?:^|[^\\])(?:\\\\)*([\r\n\u2028\u2029]|\\u(?![0-9A-Fa-f]{4})|\\x(?![0-9A-Fa-f]{2}|\\[89]|\\[0-3][0-7]?(?![89])|\\[4-7](?![89]))/
tests whether a sequence of SourceCharacters matches zero or more ( DoubleStringCharacter | SingleStringCharacter ).



I did some empirical testing to see what is actually allowed by running the below in a variety of browsers in the squarefree shell.

var notStringLiterals = [ "\r", "\\u", "\\x", "\\8", "\\28", "\\228", "\\3778", "\\478", "\\778" ]; for (var i = 0; i < notStringLiterals.length; ++i) {
  var result;
  try {
    result = eval('"' + notStringLiterals[i] + '"');
  } catch (ex) {
    result = "ERROR";
  }
  print(JSON.stringify(notStringLiterals[i]) + " : " + JSON.stringify(result)); }

All are invalid absent B.1.2 if the assertions above are true.  With B.1.2, "\3778", "\478", and "\778" are valid.

I'm having trouble running IE today, but on other browsers, in alphabetical order:

Chrome
"\r" : "ERROR"
"\\u" : "u"
"\\x" : "x"
"\\8" : "8"
"\\28" : "\u00028"
"\\228" : "\u00128"
"\\3778" : "ÿ8"
"\\478" : "'8"
"\\778" : "?8"


FF3
"\u000d" : "ERROR"
"\\u" : "u"
"\\x" : "x"
"\\8" : "8"
"\\28" : "\u00028"
"\\228" : "\u00128"
"\\3778" : "ÿ8"
"\\478" : "'8"
"\\778" : "?8"


Safari
"\r" : "ERROR"
"\\u" : "u"
"\\x" : "x"
"\\8" : "8"
"\\28" : "\u00028"
"\\228" : "\u00128"
"\\3778" : "ÿ8"
"\\478" : "'8"
"\\778" : "?8"


So at least 3 different interpreter strains treat "\u" === "u", "\x"
=== "x", "\8" === "8", and don't care whether there is a decimal digit after an octal escape sequence.  All reject unescaped newlines in string literals.


I would like to be able to specify quasiliteral literal part decoding in terms of the SV defined in 7.8.4.  If user code is going to have decoded literal parts available when they validly decode, but at least have access to the raw literal parts otherwise, then it would be good for them to be consistently available across interpreters.  Would it be worthwhile having the SV and CV in 7.8.4 specify the decoding of some sourcecharacter sequences that can't actually reach the SV or CV from via the StringLiteral production?
_______________________________________________
es-discuss mailing list
es-discuss at mozilla.org
https://mail.mozilla.org/listinfo/es-discuss


More information about the es-discuss mailing list