quasi-literal strawman

Mike Samuel mikesamuel at gmail.com
Sat Dec 19 10:02:14 PST 2009


2009/12/18 Andy Chu <andy at chubot.org>:
>>> So then my question is why it needs to specify a desugaring.  Why is a
>>> quasi-literal not a string?
>>
>> I still don't understand the question?
>>
>> Why `foo$bar` and not "foo$bar"?  Well, the latter doesn't do anything
>> useful with the expression (bar).
>
> OK, so to back up a bit, the point of quasi-literals is to shift the
> burden of escaping from the application developer to the library
> author (e.g. authors of the DOM API).  Is that an accurate and
> reasonably complete summary?  (If it is I might suggest making "safe
> string interpolation" the feature name)

"Safe String Interpolation" is the name of a quasi scheme that I would
also like to propose.  And its goal is as you describe.


> My question is why the entire quasi-literal scheme can't be in a
> library.  In your doc you have something like:

It can be done in a library but some syntactic sugar will make it much
more usable.


> new StringInterpolation(["SELECT * FROM TABLE WHERE name='", name, "'
> AND modified > ", new Date(d)])
>
> Does this variation express all of what quasi-literals can do?

Since "new StringInterpolation" could be defined to do anything that a
single user call can do, yes.


> var sqlStatement = new StringInterpolation("SELECT * FROM TABLE WHERE
> name=$name AND modified > $date", {name: name, date: new Date(d)})
>
> sqlLibrary.execute(sqlStatement)  // does "autoescaping" of name and date
>
> The syntax of the first string argument is the quasi-literal syntax in
> your doc, with $, {}, etc.
>
> Reasons why I ask:
> - People could use this scheme *now* if it were a library, and start
> changing their APIs to autoescape.

That's one reason to use desugaring.  People could use it now, and use
a tool that does the desugaring, as long as the library is written in
the intersection of ES5 strict and ES3.

> - Quasi-literals have a bit of a meta-problem.  You're lamenting the
> complex escaping rules of HTML (rightly), but then this proposal adds
> a third escaping mode to JS, which is probably the most
> programmatically-generated language on the planet (since it must go
> over the network).  Certainly I'm already confused by the discussions
> of escaping quasi-literals on this thread.

This shouldn't affect existing correct code generators since, if they
generate code with backquoted strings, they can't be correct.


>> PHP and JSP were the gold standard when I built it, and Django and
>> others have addressed that to some degree.
>> Do you know of any statistics on how much PHP code is running versus
>> Django code?
>
> I don't know, but certainly tons more PHP code.  BTW Smarty also does
> this and I think this is the most common PHP templating language now:
> http://www.smarty.net/manual/en/language.modifier.escape.php
> (I don't understand why a templating language needs a templating
> language, but I'll never understand PHP I suppose)

I think it's partly because people want to piece-wise migrate away
from PHP's easy-to-write-almost-correct-code,
very-hard-to-write-correct-code string interpolation scheme.

> I think the quasi-literals are a cool idea -- but they're also a
> pretty large innovation.  This scheme is not used by "production"
> library or language that I know of.  The point being that innovation
> in standards has the problems that Douglas Crockford has spoken about.

I absolutely agree and I'm not proposing we innovate in the standard.
That's why I hoped to standardize on a simple desugaring that is
syntactically familiar to users of existing languages, and that will
allow experimentation by library authors who, as you point out, have
the best track record of successful innovation.

> This would shift the boundaries between JavaScript and the DOM, and
> make them an anomaly among literally hundreds of other
> libraries/languages.  Template languages are a well known commodity by
> now.

I don't understand what you mean by shifting the boundaries.  Strings
are not the de facto standard for moving data across module
boundaries.  There are many languages that pass around structured
content.

> I would also say that the biggest boundary is actually getting people
> to write auto-escaping.

I'm not sure I understand.  I never said I wanted a bunch of people
writing auto-escaping code.  It needs to be done by at least one
library.
As for clients, many people write templates using a variety of
syntaxes.  The "auto" in "auto-escaping" means they just have to do in
JS what they now do in other languages.

> It's not the lack of syntax in the JS language.

I disagree.  I find it much easier to knock out code in perl or PHP
that is almost right.


> I didn't study the end of your doc, but it certainly isn't
> a simple problem for someone to write HTML autoescaping.  Considering
> how much HTML is malformed out there, I don't see how there won't be
> holes 0.1% or 0.01% of the time due to heuristics.

I believe I can demonstrate that it is that simple, but that is out of
scope for this thread.
If we can reduce XSS, the single largest source of vulnerabilities in
web applications, to .1% or .01% of its current size then I think that
will have been worth this change alone.

> I would rather use my templating language and the equivalent of auto-escaping via static
> analysis to get it 100% right (since a static analysis tool can issue
> warnings where there is ambiguity and a human applies their judgement,
> to fix their code before it has to run).

I am skeptical that there is an easy migration path for existing JS
code to something that is amenable to the kind of static analysis you
describe, but I would love to be proven wrong.  I don't see anything
on the JSON templates page about static analysis though.  Am I missing
something?  Or is that more for the other languages in which it can be
embedded than JS?

By adding a bit of generally useful syntactic sugar, your templating
scheme, secure string interpolation, and others can compete as
libraries.


> JSON Template is trivially tokenizable for this reason: \{.*?\} (or
> e.g. \[.*?\] depending on the delimiter) splits it into literals and
> substitutions.  Then you can put the work into autoescaping rather
> than parsing the template language (which always sucks because of
> escaping!).
>
>> What is locals()?  Why does expandTemplate need access to all locals
>> to do its job instead of just the specified ones?
>> Does this suffer from the "formatting string from untrusted source"
>> problem that python suffers from, and the "substitution value from
>> untrusted source" problem?
>
> That was just a shorthand to get around the verbosity of passing in
> {name: name, date: date}, and also specifying name and date in the
> quasi-literal string.  It's not strictly necessary.
>
> Not saying that quasi-literals won't work, but things to consider.
> It's an ambitious solution and it would be cool if we can forget about
> escaping as application developers, but I see more than a few
> obstacles.
>
> I think an existing library would help firm it up... I was thinking of
> writing a mini-template.js that implements almost exactly what Python
> 3k string formatting does.  This is a hole in the feature set of
> JavaScript, irrespective of security and escaping.  I think a default
> .format() method on strings would go a long way, rather than the
> current "foo " + var + " bar" idiom.  Perhaps there could be a hook
> for autoescaping, but it may be tricky.

If you would like to write up a String.prototype.format strawman and
post it at http://wiki.ecmascript.org/doku.php?id=strawman:strawman
then it can be considered alongside competing strawmen like this
proposal.

> Andy
>


More information about the es-discuss mailing list