5 June 2014 TC39 Meeting Notes

Domenic Denicola domenic at domenicdenicola.com
Fri Jun 13 09:15:48 PDT 2014

Thanks Scott; much appreciated.

IMO it would be a good universe where `<module>` had the following things `<script>` has:

- Does not require escaping < > & ' " in any contexts.
- Terminates when seeing `</module` + extra chars. (Possibly we could do this only when it would otherwise be a parsing error, to avoid `"</mod" + "ule>"` grossness? But that would require some intertwingling of the HTML and ES parsers, which I can imagine implementers disliking.)

But it removes the following things `<script>` has:

- `<!--` escaped data mode and double-escaped mode
- \r, \r\n, \0 special-casing
- The two new single-line comment forms (maybe; I know these work in Node though, so maybe just leave them in as part of the ES6 spec).

Although I know some people think making `<script>` and `<module>` have different rules would be confusing for authors, IMO this would be a nice authoring experience.
From: cananian at gmail.com <cananian at gmail.com> on behalf of C. Scott Ananian <ecmascript at cscott.net>
Sent: Friday, June 13, 2014 12:06
To: Domenic Denicola
Cc: Mark S. Miller; es-discuss; Ben Newman
Subject: Re: 5 June 2014 TC39 Meeting Notes

On Thu, Jun 12, 2014 at 11:11 AM, Domenic Denicola
<domenic at domenicdenicola.com> wrote:
> I guess part of it is clarifying which part of "<script>'s insane parsing
> rules" we're talking about. From what I'm aware of there are quite a lot of
> different insanities; but I am fuzzy on the details. Does anyone know which
> rules are inherently necessary, and which are historical accidents or
> constraints?

I'll recap the rules for "script data state" from

As a general rule, `\r` and `\r\n` are converted to `\n`, and `\0` is
not allowed.
The case-insensitive sequence `</script` followed by a character in `[
\t\r\n\f/>]` terminates the script data section.
(These constraints would be present for HTML-embedding.)

In addition, the exact character sequence `<!--` switches to "escaped
data" parsing.  This is a bit hairy, and you can even end up in
"double escaped" modes.  See
for an example.  Presumably these are the "insane parsing rules" under
discussion.  You are encouraged to try to follow the logic in the
WHATWG spec yourself. ;)

In addition, [Web EcmaScript](http://javascript.spec.whatwg.org/)
introduces two new single line comment forms: `<!--` must be treated
as if it were `//`, and `-->` (with some crazy start-of-line
restrictions) is also treated as a single line comment.

To some degree the line between the HTML parser and Web EcmaScript is
movable; currently the HTML parser recognizes the `<!--` etc tokens
but pushes them into the data section of the script tag anyway; one
could just as easily imagine the HTML parser doing all the work and
stripping the "new comment forms" from the token stream.

More information about the es-discuss mailing list