5 June 2014 TC39 Meeting Notes

C. Scott Ananian ecmascript at cscott.net
Fri Jun 13 09:06:27 PDT 2014

On Thu, Jun 12, 2014 at 11:11 AM, Domenic Denicola
<domenic at domenicdenicola.com> wrote:
> I guess part of it is clarifying which part of "<script>'s insane parsing
> rules" we're talking about. From what I'm aware of there are quite a lot of
> different insanities; but I am fuzzy on the details. Does anyone know which
> rules are inherently necessary, and which are historical accidents or
> constraints?

I'll recap the rules for "script data state" from

As a general rule, `\r` and `\r\n` are converted to `\n`, and `\0` is
not allowed.
The case-insensitive sequence `</script` followed by a character in `[
\t\r\n\f/>]` terminates the script data section.
(These constraints would be present for HTML-embedding.)

In addition, the exact character sequence `<!--` switches to "escaped
data" parsing.  This is a bit hairy, and you can even end up in
"double escaped" modes.  See
for an example.  Presumably these are the "insane parsing rules" under
discussion.  You are encouraged to try to follow the logic in the
WHATWG spec yourself. ;)

In addition, [Web EcmaScript](http://javascript.spec.whatwg.org/)
introduces two new single line comment forms: `<!--` must be treated
as if it were `//`, and `-->` (with some crazy start-of-line
restrictions) is also treated as a single line comment.

To some degree the line between the HTML parser and Web EcmaScript is
movable; currently the HTML parser recognizes the `<!--` etc tokens
but pushes them into the data section of the script tag anyway; one
could just as easily imagine the HTML parser doing all the work and
stripping the "new comment forms" from the token stream.

More information about the es-discuss mailing list