5 June 2014 TC39 Meeting Notes

C. Scott Ananian ecmascript at cscott.net
Fri Jun 13 09:06:27 PDT 2014


On Thu, Jun 12, 2014 at 11:11 AM, Domenic Denicola
<domenic at domenicdenicola.com> wrote:
> I guess part of it is clarifying which part of "<script>'s insane parsing
> rules" we're talking about. From what I'm aware of there are quite a lot of
> different insanities; but I am fuzzy on the details. Does anyone know which
> rules are inherently necessary, and which are historical accidents or
> constraints?

I'll recap the rules for "script data state" from
http://www.whatwg.org/specs/web-apps/current-work/multipage/tokenization.html#script-data-state

As a general rule, `\r` and `\r\n` are converted to `\n`, and `\0` is
not allowed.
The case-insensitive sequence `</script` followed by a character in `[
\t\r\n\f/>]` terminates the script data section.
(These constraints would be present for HTML-embedding.)

In addition, the exact character sequence `<!--` switches to "escaped
data" parsing.  This is a bit hairy, and you can even end up in
"double escaped" modes.  See
http://stackoverflow.com/questions/23727025/script-double-escaped-state
for an example.  Presumably these are the "insane parsing rules" under
discussion.  You are encouraged to try to follow the logic in the
WHATWG spec yourself. ;)

In addition, [Web EcmaScript](http://javascript.spec.whatwg.org/)
introduces two new single line comment forms: `<!--` must be treated
as if it were `//`, and `-->` (with some crazy start-of-line
restrictions) is also treated as a single line comment.

To some degree the line between the HTML parser and Web EcmaScript is
movable; currently the HTML parser recognizes the `<!--` etc tokens
but pushes them into the data section of the script tag anyway; one
could just as easily imagine the HTML parser doing all the work and
stripping the "new comment forms" from the token stream.
  --scott


More information about the es-discuss mailing list