JSON.stringify </script>

Mike Samuel mikesamuel at gmail.com
Thu Sep 29 15:08:53 UTC 2016


On Thu, Sep 29, 2016 at 2:09 AM, Alexander Jones <alex at weej.com> wrote:
> In XHTML, CDATA allows a 'more' verbatim spelling of text node content. But
> the end token has to be escaped, as discussed. Despite this escaping, the
> text node can contain arbitrary strings.



> In XHTML, you *can* achieve the same effect without CDATA, just by escaping
> XML entities. Again, and cruciallt, the text node can contain arbitrary
> strings.

So, <script><![CDATA[...]]></script> has a complete escaping process,
whereas, since CDATA sections were taken out of HTML foreign element
content disallowing
  <svg><script><![[CDATA[...]]></script></svg>
HTML does not, so to figure out how to embed

  alert("</script>");
  if (a < /script>/.exec(myString)) ...

you have to do scripting language specific analysis.

Is that about right?


> In HTML without CDATA, using HTML entities within the script tag is wrong
> specifically because they are *not* interpreted. The text node in the HTML
> document CANNOT contain arbitrary strings, and there is no further decode
> step before the JS parser hits your code, so you're forced to take other
> measures to ensure that `</script>` does not appear in your code. There are
> a few places this can appear, only one of which is embedded in string
> literals, so the method of avoiding this is actually sensitive to the
> context and not practical to specify.



> I hope you can appreciate how ridiculous this problem is for HTML - I don't
> believe CDATA support in HTML 5 can solve this due to forward compatibility
> - which is why it's an antipattern. Just don't do it, or use XHTML. It's not
> cool to hate on XML anymore. ;)

Yes.  I've written hardened DOM tree serializers.  I appreciate these problems.
No-one is hating on XML.

We're talking about JSON serializers.  Every JSON serializers produces
a subset of the output language. Choices about that sublanguage affect
how easy/hard it is to use that serializer with other tools.

That "if everyone wrote software with property P, we would not have
problem Q" is a great argument that we should prefer stacks with
property P, but does not mean we should not take the prevalence of
problem Q into account when designing elements of software stacks.
You seem to actually be arguing that we should not do our best to
prevent problem Q by other means, but real systems need
defense-in-depth.

So I concede your point about CDATA sections but don't see that these
arguments about antipatterns and the benefits of XHTML are all that
relevant.



> Alex
>
>
>
> On Thursday, 29 September 2016, Mike Samuel <mikesamuel at gmail.com> wrote:
>>
>> Without CDATA you have to encode script bodies properly.  With CDATA you
>> have to encode script bodies properly.  What problem did CDATA solve?
>>
>>
>> On Sep 28, 2016 8:03 PM, "Alexander Jones" <alex at weej.com> wrote:
>>>
>>> They do solve the problem. You encode your entire JS *before* pasting it,
>>> encoding `]]>` and nothing more, and the XML document's text node contains
>>> the unadulterated text, which the JS parser also sees. It's perfect layer
>>> isolation. Ye olde HTML can't do that because there is no escaping mechanism
>>> for `</script>` that actually allows the JS parser to see the text (code)
>>> content unmodified.
>>>
>>> Viva la `<xhtml:revolución />` ;)
>>>
>>> On Wednesday, 28 September 2016, Mike Samuel <mikesamuel at gmail.com>
>>> wrote:
>>>>
>>>> I agree it's subideal which is why I work to address problems like this
>>>> in template systems but ad-hoc string concatenation happens and embeddable
>>>> sub-languages provide defense-in-depth without sacrificing correctness.
>>>>
>>>> CDATA sections solve no problems because they cannot contain any string
>>>> that has "]]>" as a substring so you still have to s/\]\]>/]]>]]<!CDATA>/g.
>>>>
>>>>
>>>> On Sep 28, 2016 2:32 PM, "Alexander Jones" <alex at weej.com> wrote:
>>>>>
>>>>> That's awful. As you say, it's an antipattern, no further effort should
>>>>> be spent on this. JSON produced by JavaScript has far more general uses than
>>>>> slapping directly into a script tag unencoded, so no-one else should have to
>>>>> see this. Also, there are many other producers of JSON than JavaScript.
>>>>>
>>>>> Instead, use XHTML and CDATA (which has a straightforward encoding
>>>>> mechanism that doesn't ruin the parseability of the code or affect it in any
>>>>> way) if you really want to pull stunts like this.
>>>>>
>>>>> Alex
>>>>>
>>>>> On Wednesday, 28 September 2016, Michał Wadas <michalwadas at gmail.com>
>>>>> wrote:
>>>>>>
>>>>>> Idea: require implementations to stringify "</script>" as
>>>>>> "<\uxxxxscript>".
>>>>>>
>>>>>> Benefits: remove XSS vulnerability when injecting JSON as content of
>>>>>> <script> tag (quite common antipattern).
>>>>>>
>>>>>> Backward compatible: yes, unless binary equality is required and this
>>>>>> string is used.
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> es-discuss mailing list
>>>>> es-discuss at mozilla.org
>>>>> https://mail.mozilla.org/listinfo/es-discuss
>>>>>
>


More information about the es-discuss mailing list