Last call for consensus on format-control char. issues
douglas at crockford.com
Wed Jun 17 10:23:38 PDT 2009
Allen Wirfs-Brock wrote:
> Waldemar's notes from the May F2F don't record any decision on the issue of <ZWNJ> and <ZWJ> in identifiers. However, my personal notes say that I need to "keep in identifiers and fix grammar" which is also my recollection of what we decided at the meeting.
> The simplest implementation of that decisions is to simply add <ZWNJ> and <ZWJ> as alternatives for IdentifierPart. In addition, the text in section 7.1 that says that format control characters can occur in identifier presumably needs to be narrowed to say only <ZWNJ> and <ZWJ>.
> At about the same time as the F2F David-Sarah made a more comprehensive proposal (duplicated below) that in addition to addressing <ZWNJ> and <ZWJ> also significantly refines the rules for <BOM> including excluding them from strings literals and regular expressions and making it a syntax error for a <BOM> to appear within an identifier.
> I'm not a Unicode expert, but my sense is that David-Sarah's proposal is sound and probably consistent with the original goals of cleaning up class Cf in the specification. However, his rules for <BOM> also seem like they could significantly complicate the lexical analysis phase of implementations.
> My sense from the F2F is that the consensus was more in the direction of my simple solution above (<ZWNJ> and <ZWJ> in identifiers, <BOM> is whitespace) rather than David-Sarah's more comprehensive treatment of <BOM>.
> I need to have a final decision on this so I can update the draft accordingly. Based upon my recollection of the F2F I'm going to go with the "simple solution" unless there is apparent consensus otherwise.
> Final thoughts?
BOM should be whitespace. It should be allowed in string literals.
More information about the es5-discuss