quasi-literal strawman
Andy Chu
andy at chubot.org
Thu Dec 17 00:46:54 PST 2009
> That a lot of formatting can be done in-library is a great point.
>
> Hopefully, by providing a desugaring that can easily be back-ported to
> older code by things like rewriting minifiers, and implementing most
> of the feature in library code that will run on older versions of JS;
> we can allow people to use and experiment with
> formatting/interpolation schemes even in code that needs to run on
> legacy interpreters.
Right, I like the idea of being able to run in ES3.1/5 implementations
(at the cost of speed).
> The quasi-literal proposal specifies only a desugaring, and the safe
> interpolation scheme that I want to do is not part of this proposal
> and would be done in a library. I hope to convince W3C that this
> library is something worth standardizing on and that innerHTML,
> document.write, cssText, and other language encoding entry points into
> the DOM internals should be aware of it.
So then my question is why it needs to specify a desugaring. Why is a
quasi-literal not a string? What is typeof html`foo`? Is it
"function"?
Can you elaborate on the relationship to the DOM? I didn't see it in
the doc. So you're saying that innerHTML can be set to a
quasi-literal now, in addition to a string? I don't see the situation
where you can't just expand the quasi-literal to a string and then set
innerHTML.
I think some more example applications in the doc would help. Right
now I don't see much difference between quasi-literals and a template
language as a library, but I may be missing something.
> This scheme could be built on top of quasis with a minor syntactic change.
>
> jsont`{$name:html}: <a
> href="{$url|html-attr-value}">{$anchor|html}</a>{default=html}`
>
> function jsont(var_args) {
> var literalPortions = Array.prototype.slice.call(arguments, 0);
> var escapingModes = [];
Interesting, this API is not that unlike JSON Template's API. I'm not
sure I see a big difference in functionality or safety either way.
I would argue with this statement from your doc: "First, full blown
templating languages, with a few exceptions, do next to nothing to
solve escaping problems."
This is probably true of PHP and JSP, but more modern template
languages have "formatters/filters" built in. Django,
google-ctemplate, and JSON Template have this. When combined with an
option for a default filter, this "solves" escaping AFAICT. Do
quasi-literals do it better? You are making a early/late binding
argument, but I don't see when this becomes necessary.
If it is because variables come from the calling scope rather than the
scope receiving the quasi-literal, then let me propose just using
something like locals() in Python.
def foo():
a = 1
s = expandTemplate("{a}", locals())
Now expandTemplate receives the argument {"a": 1} and can return the string "1".
>> I like the idea of "enabling DSLs", but I feel like this proposal is a
>> DSL itself, rather than enabling them, since it has a fairly
>> particular syntax, and you have defined the parse tree very
>> specifically.
>
> I'm not sure I follow. Are you referring to the `...` syntax with
> embedded $foo and ${expression} chunks?
Yes, I don't see why this should be hard-coded in the language. It's
a third set of escaping rules to learn (strings and regexes being the
first 2, and actually regexes have a fourth set -- inside character
classes [^$] and outside).
I also think the syntax is complicated ( \${}` are special, as opposed
to strings where ""\ are special, and regexes where / is special). I
wouldn't be at all surprised if it needs to grow based on some new use
cases.
For substitution, let me plug the JSON Template scheme: "{foo}" is a
substituion. If the string contains {}, then choose [] as the
metacharacters: Template("[foo]", meta="[]"). So the default
meta="{}".
That's it. IMHO this is the simplest possible scheme that covers all
applications. Any character you pick will be suboptimal for some DSL
-- in particular quasi-literals themselves. How do you write a
quasi-literal for quasi-literals? My guess is it will look pretty
nasty.
I don't see why the metadata needs to be inside the quasi-literal, as
opposed to just being another argument to a function that takes a
quasi-literal.
>> Another Python analogy is that they chose not to embed regex's in the
>> language, unlike JavaScript/Perl/Ruby. Instead there is a very
>> minimal syntactic accomodation -- raw strings which don't have
>> backslash escaping. The Go language takes this same approach with
>> backticks I believe (e.g. `\s+` and not "\\s+").
>
> This proposal gets you raw strings easily :)
>
> new RegExp(r`\s+foo\s+`, 'i')
>
> function r(string) {
> if (arguments.length != 1) { throw new Error(); }
> return function () { return string; }; // Trivially inlinable
> }
I view the /\s+/ syntax for regexes as superfluous and overly
specific, so if this mechanism can somehow generalize that and retire
the old syntax, that's a plus.
>> I do think JavaScript really needs better string interpolation than
>> "foo " + var + " bar", which unfortunately a common idiom. I think
>> that perhaps all that would be necessary is to have a .format() method
>> on strings, like Python. Python switched from the operator % to a
>> simple method.
>
> Yep. Except that python is planning on supporting the % operator for
> some time to come, right? One other nice side-effect of providing a
> generic platform for DSLs and doing formatting/interpolation in a
> library is that applications can have as many of these schemes side by
> side as they like, and when one becomes obsolete, you only have to
> deprecate library code instead of language syntax or core object
> methods.
I totally agree with that, but simply using a library gets even more
of those benefits. So far, the 2 things I see in quasi-literals that
you can't do with a library are:
1) The syntax -- however as mentioned I don't find the syntax to be a
benefit. Is html`foo$bar` better than html("foo{bar}") ? I
personally like how ES5 introduced no new syntax.
2) The locals() thing. This would be a much smaller addition to the language.
Am I missing something? (could be)
> One oft ignored criterion for judging string formatting schemes is how
> resistant they are to quoting confusion.
> The python3 string formatting is really bad in this respect. It's
> security considerations section does a good job of pointing out that
> formatting strings from untrusted sources are a problem and should not
> be used, but does not mention the other side of the problem --
> substitution values from untrusted sources.
100% agree. But, introducing another syntax leads to its own kind of
"quoting confusion". As mentioned, how does a quasi-literal for a
quasi-literal look, or a regex for a quasi-literal, or a quasi-literal
for a regex?
How hard is it write a program to extract all quasi-literals from JS
source, and analyze them statically? New syntax makes this kind of
thing more complicated. As it is, JS is not too hard to parse.
> Reference 2 in the proposal argues that requiring developers to
> specify an escaping scheme, or do it manually, is shifting a large and
> unnecessary burden onto them, and when they make errors, those errors
> often result in vulnerabilities.
So this is "auto-escaping", right? In my example, this is saying that
you automatically detect based on the literal portions whether you
need to use the "html" escape or "html-attr-value" escape. Most
template languages don't do this. But there is no reason that they
can't. I don't think the JS language is the barrier to doing this
now.
thanks,
Andy
More information about the es-discuss
mailing list