quasi-literal strawman

Mike Samuel mikesamuel at gmail.com
Wed Dec 16 12:21:09 PST 2009


2009/12/16 Andy Chu <andy at chubot.org>:
>>> If you haven't yet read http://www.python.org/dev/peps/pep-3101/ (Advanced String Formatting) I suggest you do - its well worth a read and feels like a possible very javascripty solution.
>>
>> I have not read it.  Thanks for the link.  It has a good summary of
>> alternate syntaxes and establishes a point partway between positional
>> and inline syntax.  It does include a bunch of format specifiers that
>> I think incompatible with DSL schemes.
>
> I also like Python 3's string formatting design.  But it makes me
> wonder -- how much of the quasi-literal proposal can be done in a
> library?  I think basically all of string formatting could be
> implemented as a library in Python 2 -- you just need to implement
> your own .format() method on strings.

That a lot of formatting can be done in-library is a great point.

Hopefully, by providing a desugaring that can easily be back-ported to
older code by things like rewriting minifiers, and implementing most
of the feature in library code that will run on older versions of JS;
we can allow people to use and experiment with
formatting/interpolation schemes even in code that needs to run on
legacy interpreters.

The quasi-literal proposal specifies only a desugaring, and the safe
interpolation scheme that I want to do is not part of this proposal
and would be done in a library.  I hope to convince W3C that this
library is something worth standardizing on and that innerHTML,
document.write, cssText, and other language encoding entry points into
the DOM internals should be aware of it.



> JSON Template was explicitly designed to be "upwardly compatible" with
> Python 3k string formatting: http://code.google.com/p/json-template/
>
> Here's how you would embed HTML:
>
> '{name|html}: <a href="{url|html-attr-value}">{anchor|html}</a>'

This scheme could be built on top of quasis with a minor syntactic change.

jsont`{$name:html}: <a
href="{$url|html-attr-value}">{$anchor|html}</a>{default=html}`

function jsont(var_args) {
  var literalPortions = Array.prototype.slice.call(arguments, 0);
  var escapingModes = [];
  var defaultMode = void 0;
  var n = literalPortions.length;
  var m = literalPortions[n - 1].match(/\{default=([\w-]+)\}$/);
  if (m) {
    defaultMode = m[1];
    literalPortions[n - 1] = literalPortions[n - 1]
      .substring(0, literalPortions[n - 1].length - m[0].length);
  }
  for (var i = 0; i < n - 1; ++i) {
    if (/\{$/.test(literalPortions[i])
        && (m = literalPortions[i + 1].match(/^\|([\w-]+)\}/))) {
      literalPortions[i] = literalPortions[i].substring(
          0, literalPortions[i].length - 1);
      literalPortions[i + 1] = literalPortions[i + 1].substring(m[0].length);
      escapingModes[i] = m[1];
    } else {
      escapingModes[i] = defaultMode;
    }
  }
  var buffer = [];
  for (var i = n; --i >= 0;) {
    buffer[i * 2] = literalPortions[i];
  }
  return function (var_args) {
    var cloneBuffer = buffer.slice(0);
    for (var i = 0, n = arguments.length; i < n; ++i) {
      cloneBuffer[i * 2 + 1] = applyEscapingMode(arguments[i],
escapingModes[i]);
    }
    return cloneBuffer.join('');
  };
}

function applyEscapingMode(value, mode) {
  value = '' + value;
  switch (mode) {
    case 'html':
      return value.replace(/&/g, '&amp;')
         .replace(/</g, '&lt;').replace(/>/g, '&gt;');
    case 'html-attr-value':
      return applyEscapingMode(value, 'html')
         .replace(/\"/g, '&#34;').replace(/\'/g, '&#39;');
    default:
      if (/[\"\'<>&\\]/.test(value)) { throw new Error(); }
      return value;
  }
}

> where html escapes <>& and html-attr-value escapes <>&".
>
> You can set a default formatter for security:
>
> t = jsontemplate.Template('{name}: <a
> href="{url|html-attr-value}">{anchor}</a>', default_formatter='html')
>
> It's compiled into alternating literal strings and substitution nodes.
>
> I like the idea of "enabling DSLs", but I feel like this proposal is a
> DSL itself, rather than enabling them, since it has a fairly
> particular syntax, and you have defined the parse tree very
> specifically.

I'm not sure I follow.  Are you referring to the `...` syntax with
embedded $foo and ${expression} chunks?


> Another Python analogy is that they chose not to embed regex's in the
> language, unlike JavaScript/Perl/Ruby.  Instead there is a very
> minimal syntactic accomodation -- raw strings which don't have
> backslash escaping.  The Go language takes this same approach with
> backticks I believe (e.g. `\s+` and not "\\s+").

This proposal gets you raw strings easily :)

new RegExp(r`\s+foo\s+`, 'i')

function r(string) {
  if (arguments.length != 1) { throw new Error(); }
  return function () { return string; };  // Trivially inlinable
}



> I do think JavaScript really needs better string interpolation than
> "foo " + var + " bar", which unfortunately a common idiom.  I think
> that perhaps all that would be necessary is to have a .format() method
> on strings, like Python.  Python switched from the operator % to a
> simple method.

Yep.  Except that python is planning on supporting the % operator for
some time to come, right?  One other nice side-effect of providing a
generic platform for DSLs and doing formatting/interpolation in a
library is that applications can have as many of these schemes side by
side as they like, and when one becomes obsolete, you only have to
deprecate library code instead of language syntax or core object
methods.

One oft ignored criterion for judging string formatting schemes is how
resistant they are to quoting confusion.
The python3 string formatting is really bad in this respect.  It's
security considerations section does a good job of pointing out that
formatting strings from untrusted sources are a problem and should not
be used, but does not mention the other side of the problem --
substitution values from untrusted sources.
Reference 2 in the proposal argues that requiring developers to
specify an escaping scheme, or do it manually, is shifting a large and
unnecessary burden onto them, and when they make errors, those errors
often result in vulnerabilities.

> Andy
> _______________________________________________
> es-discuss mailing list
> es-discuss at mozilla.org
> https://mail.mozilla.org/listinfo/es-discuss
>


More information about the es-discuss mailing list