quasi-literal strawman

Andy Chu andy at chubot.org
Thu Dec 17 00:46:54 PST 2009


> That a lot of formatting can be done in-library is a great point.
>
> Hopefully, by providing a desugaring that can easily be back-ported to
> older code by things like rewriting minifiers, and implementing most
> of the feature in library code that will run on older versions of JS;
> we can allow people to use and experiment with
> formatting/interpolation schemes even in code that needs to run on
> legacy interpreters.

Right, I like the idea of being able to run in ES3.1/5 implementations
(at the cost of speed).

> The quasi-literal proposal specifies only a desugaring, and the safe
> interpolation scheme that I want to do is not part of this proposal
> and would be done in a library.  I hope to convince W3C that this
> library is something worth standardizing on and that innerHTML,
> document.write, cssText, and other language encoding entry points into
> the DOM internals should be aware of it.

So then my question is why it needs to specify a desugaring.  Why is a
quasi-literal not a string?  What is typeof html`foo`?  Is it
"function"?

Can you elaborate on the relationship to the DOM?  I didn't see it in
the doc.  So you're saying that innerHTML can be set to a
quasi-literal now, in addition to a string?  I don't see the situation
where you can't just expand the quasi-literal to a string and then set
innerHTML.

I think some more example applications in the doc would help.  Right
now I don't see much difference between quasi-literals and a template
language as a library, but I may be missing something.


> This scheme could be built on top of quasis with a minor syntactic change.
>
> jsont`{$name:html}: <a
> href="{$url|html-attr-value}">{$anchor|html}</a>{default=html}`
>
> function jsont(var_args) {
>  var literalPortions = Array.prototype.slice.call(arguments, 0);
>  var escapingModes = [];

Interesting, this API is not that unlike JSON Template's API.  I'm not
sure I see a big difference in functionality or safety either way.

I would argue with this statement from your doc: "First, full blown
templating languages, with a few exceptions, do next to nothing to
solve escaping problems."

This is probably true of PHP and JSP, but more modern template
languages have "formatters/filters" built in.  Django,
google-ctemplate, and JSON Template have this.  When combined with an
option for a default filter, this "solves" escaping AFAICT.  Do
quasi-literals do it better?  You are making a early/late binding
argument, but I don't see when this becomes necessary.

If it is because variables come from the calling scope rather than the
scope receiving the quasi-literal, then let me propose just using
something like locals() in Python.

def foo():
  a = 1
  s = expandTemplate("{a}", locals())

Now expandTemplate receives the argument {"a": 1} and can return the string "1".


>> I like the idea of "enabling DSLs", but I feel like this proposal is a
>> DSL itself, rather than enabling them, since it has a fairly
>> particular syntax, and you have defined the parse tree very
>> specifically.
>
> I'm not sure I follow.  Are you referring to the `...` syntax with
> embedded $foo and ${expression} chunks?

Yes, I don't see why this should be hard-coded in the language.  It's
a third set of escaping rules to learn (strings and regexes being the
first 2, and actually regexes have a fourth set -- inside character
classes [^$] and outside).

I also think the syntax is complicated ( \${}` are special, as opposed
to strings where ""\ are special, and regexes where / is special).  I
wouldn't be at all surprised if it needs to grow based on some new use
cases.

For substitution, let me plug the JSON Template scheme: "{foo}" is a
substituion.  If the string contains {}, then choose [] as the
metacharacters: Template("[foo]", meta="[]").  So the default
meta="{}".

That's it.  IMHO this is the simplest possible scheme that covers all
applications.  Any character you pick will be suboptimal for some DSL
-- in particular quasi-literals themselves.  How do you write a
quasi-literal for quasi-literals?  My guess is it will look pretty
nasty.

I don't see why the metadata needs to be inside the quasi-literal, as
opposed to just being another argument to a function that takes a
quasi-literal.

>> Another Python analogy is that they chose not to embed regex's in the
>> language, unlike JavaScript/Perl/Ruby.  Instead there is a very
>> minimal syntactic accomodation -- raw strings which don't have
>> backslash escaping.  The Go language takes this same approach with
>> backticks I believe (e.g. `\s+` and not "\\s+").
>
> This proposal gets you raw strings easily :)
>
> new RegExp(r`\s+foo\s+`, 'i')
>
> function r(string) {
>  if (arguments.length != 1) { throw new Error(); }
>  return function () { return string; };  // Trivially inlinable
> }

I view the /\s+/ syntax for regexes as superfluous and overly
specific, so if this mechanism can somehow generalize that and retire
the old syntax, that's a plus.

>> I do think JavaScript really needs better string interpolation than
>> "foo " + var + " bar", which unfortunately a common idiom.  I think
>> that perhaps all that would be necessary is to have a .format() method
>> on strings, like Python.  Python switched from the operator % to a
>> simple method.
>
> Yep.  Except that python is planning on supporting the % operator for
> some time to come, right?  One other nice side-effect of providing a
> generic platform for DSLs and doing formatting/interpolation in a
> library is that applications can have as many of these schemes side by
> side as they like, and when one becomes obsolete, you only have to
> deprecate library code instead of language syntax or core object
> methods.

I totally agree with that, but simply using a library gets even more
of those benefits.  So far, the 2 things I see in quasi-literals that
you can't do with a library are:

1) The syntax -- however as mentioned I don't find the syntax to be a
benefit.  Is html`foo$bar` better than html("foo{bar}") ?  I
personally like how ES5 introduced no new syntax.

2) The locals() thing.  This would be a much smaller addition to the language.

Am I missing something? (could be)

> One oft ignored criterion for judging string formatting schemes is how
> resistant they are to quoting confusion.
> The python3 string formatting is really bad in this respect.  It's
> security considerations section does a good job of pointing out that
> formatting strings from untrusted sources are a problem and should not
> be used, but does not mention the other side of the problem --
> substitution values from untrusted sources.

100% agree.  But, introducing another syntax leads to its own kind of
"quoting confusion".  As mentioned, how does a quasi-literal for a
quasi-literal look, or a regex for a quasi-literal, or a quasi-literal
for a regex?

How hard is it write a program to extract all quasi-literals from JS
source, and analyze them statically?  New syntax makes this kind of
thing more complicated.  As it is, JS is not too hard to parse.

> Reference 2 in the proposal argues that requiring developers to
> specify an escaping scheme, or do it manually, is shifting a large and
> unnecessary burden onto them, and when they make errors, those errors
> often result in vulnerabilities.

So this is "auto-escaping", right?  In my example, this is saying that
you automatically detect based on the literal portions whether you
need to use the "html" escape or "html-attr-value" escape.  Most
template languages don't do this.  But there is no reason that they
can't.  I don't think the JS language is the barrier to doing this
now.

thanks,
Andy


More information about the es-discuss mailing list