Comments on April ES5 final draft standard tc39-2009-025

John Cowan cowan at ccil.org
Sun Apr 26 13:22:33 PDT 2009


David-Sarah Hopwood scripsit:

> > Case 1 is what's needed for Persian and for Hindi and various other
> > Indic-script languages.
> 
> So let's do case 1.

It turns out that Unicode 5.1 has done the heavy lifting: the bad news is
that the lifting is indeed heavy.  You want to allow Cf characters if and
only if they actually make a semantic distinction in contemporary use.
That turns out, says Unicode 5.1, to allow only U+200C and U+200D and
then only in certain contexts: the rules involve knowing the Script and
Joining_Type properties of nearby identifier characters.  Details at
http://unicode.org/reports/tr31/#Layout_and_Format_Control_Characters .

My recommendation is to allow U+200C and U+200D as IdentifierParts,
but disallow all other Cfs.

> I assume that format control characters should be allowed in both
> IdentifierPart and IdentifierStart?

In no case are these characters required as IdentifierStarts.

-- 
John Cowan    http://ccil.org/~cowan    cowan at ccil.org
Economists were put on this planet to make astrologers look good.
        --Leo McGarry


More information about the es5-discuss mailing list