JSON.stringify </script>

Mike Samuel mikesamuel at gmail.com
Thu Sep 29 18:29:28 UTC 2016


On Wed, Sep 28, 2016 at 10:06 AM, Michał Wadas <michalwadas at gmail.com> wrote:
> Idea: require implementations to stringify "</script>" as "<\uxxxxscript>".
>
> Benefits: remove XSS vulnerability when injecting JSON as content of
> <script> tag (quite common antipattern).
>
> Backward compatible: yes, unless binary equality is required and this string
> is used.

TLDR; I'm against this.

I've pushed back against a number of threads, so I want to avoid
leaving the impression that I support this proposal.

I think this is a bad idea, so let me try to pull together the various
threads and address them in one place.


Should EcmaScript or any other standards body define "embeddable JSON"?
============================================================
No.  Standards bodies move slowly.  The main argument for this feature
is to make it easier to write more secure code, and to transparently
make existing code more secure.

Standards bodies move too slowly.  Library code can roll-out quickly
in response to zero-days or emerging threats, but standards cannot.

For example, client-side templates using mustaches ( goo.gl/eztprF )
are an emerging threat.

There has been a poor history of this, even with JSON.  Crock's RFC 4627 said
"""
    A JSON text can be safely passed into JavaScript's eval() function
   (which compiles and executes a string) if all the characters not
   enclosed in strings are in the set of characters that form JSON
   tokens.  This can be quickly determined in JavaScript with two
   regular expressions and calls to the test and replace methods.

      var my_JSON_object = !(/[^,:{}\[\]0-9.\-+Eaeflnr-u \n\r\t]/.test(
             text.replace(/"(\\.|[^"\\])*"/g, ''))) &&
         eval('(' + text + ')');
"""
which is not in the latest JSON RFC because it was found to be false
in a dozen ways
before RFC 7158 (obsoleted) removed that language.

The only way to deal with emerging threats is to have a quickly
patchable system.  Patching serializers causes spurious test failures,
the broken-hearts problem:
   assertTrue("I <3 u", serializeHtml("I <3 u"))
I suspect that the best we will ever be able to do re emerging-threats
is to allow those who care about security to patch and fix tests and
ignore the maintenance cost to unmaintained projects.


Is there any value in embeddable sanitizers?
=================================
I think embeddable serializers can provide defense-in-depth against
faults in code that composes network messages which is why I wrote
https://github.com/OWASP/json-sanitizer to do just that.


Is this backwards compatible?
=======================
No.  JSON strings are used as keys in persisted tables because we have
de-facto defined a canonical subset of JSON.

This kind of thing can be discouraged by randomizing the way Java is
doing with builtin map implementaions in Java 9 and helps avoid
broken-hearts problems.  Java is a large API language so can provide
umpteen variants of x in a way that wouldn't fit well in ES, and
providing an alternate API loses a lot of the benefit of the original
proposal.


Are embeddable serializers an anti-pattern?
========================================
No.  The anti-pattern is that trustworthy and untrustworthy content
are mixed using naive string concatenation to produce a trusted
output.

Even if the real anti-pattern were not endemic within distributed
systems, composing trustworthy network messages is hard and embeddable
serializers provide useful defense-in-depth for message composing
code.


Is XHTML more easily secured than HTML?
======================
Yes.  XML is much more easily statically analyzed, and mistaken
assumptions in a serializer much more frequently manifest as parse
failures so fail safe more often.  When the embedding language
fails-safe, the whole is more secure than if you have an embedded
languages that fails-safe in an embedding language which does not as
is the case with JSON in HTML.

This is why, when I write an HTML sanitizer or hardened DOM
serializer, I try to make the output the intersection of HTML &
vanilla XML+namespaces.  (This prevents use of CDATA sections,
incidentally so serializers have included JS rewriters.).

At the risk of FUD though, XHTML-specific parsing branches might be
simpler but have been much less heavily tested and fuzzed, so it might
actually be easier to craft a buffer overflow to take over the
renderer for an origin that serves XHTML than one that serves HTML
exclusively.

The security of XHTML is not relevant though, because XHTML isn't used.

To anyone who is passionate about the benefits of making HTML more
XML-like, I would be happy to help with a proposal to the
content-security-policy team or similar body to add a switch that says
that the parsing should halt as soon as it is realized that the
content is not syntactically valid XML to get the fail-safe benefits
of XML.


More information about the es-discuss mailing list