BOM inside tokens
Ash Berlin
ash_es4 at firemirror.com
Tue Jul 15 10:42:17 PDT 2008
On 15 Jul 2008, at 18:39, Ash Berlin wrote:
>
> On 15 Jul 2008, at 18:22, Igor Bukanov wrote:
>
>> The currently proposed rule for byte-order-mark (BOM) characters in
>> ES4 sources is to replace them by whitespace outside of tokens. But
>> what is exactly the tokens in a case like -<bom>-?
>>
>> AFAICS it would be treated as - - turning cases like:
>> -<bom>-a;
>> into
>> - -a;
>> versus
>> --a;
>> that would be with current ES3 implementations.
>>
>> Regards, Igor
>> _
>
> Hmmm. according do UnicodeCheck app on my mac (and thus to one version
> or other of the Unicode spec) a BOM (uFEFF) is 'ZERO WIDTH NO-BREAK
> SPACE'
>
> • NamesList:
> = BYTE ORDER MARK (BOM), ZWNBSP
> • may be used to detect byte order by contrast with the
> noncharacter code point FFFE
> • use as an indication of non-breaking is deprecated; see 2060
> instead
> → (zero width space - 200B)
> → (word joiner - 2060)
> → (<not a character> - FFFE)
> • Designated in Unicode 1.1
>
> I'd say that a BOM should be treated just like any ordinary whitespace
> char - namely that it should invalid in spaces, and beyond that why is
> any conversion needed, since its a valid unicode character...
>
Invalid in *identifiers*
More information about the Es4-discuss
mailing list