BOM inside tokens

Ash Berlin ash_es4 at firemirror.com
Tue Jul 15 10:42:17 PDT 2008


On 15 Jul 2008, at 18:39, Ash Berlin wrote:

>
> On 15 Jul 2008, at 18:22, Igor Bukanov wrote:
>
>> The currently proposed rule for byte-order-mark (BOM) characters in
>> ES4 sources is to replace them by whitespace outside of tokens. But
>> what is exactly the tokens in a case like -<bom>-?
>>
>> AFAICS it would be treated as - - turning cases like:
>> -<bom>-a;
>> into
>> - -a;
>> versus
>> --a;
>> that would be with current ES3 implementations.
>>
>> Regards, Igor
>> _
>
> Hmmm. according do UnicodeCheck app on my mac (and thus to one version
> or other of the Unicode spec) a BOM (uFEFF) is 'ZERO WIDTH NO-BREAK
> SPACE'
>
> •	NamesList:
> 		= BYTE ORDER MARK (BOM), ZWNBSP
> 		• may be used to detect byte order by contrast with the
> noncharacter code point FFFE
> 		• use as an indication of non-breaking is deprecated; see 2060
> instead
> 		→ (zero width space - 200B)
> 		→ (word joiner - 2060)
> 		→ (<not a character> - FFFE)
> •	Designated in Unicode 1.1
>
> I'd say that a BOM should be treated just like any ordinary whitespace
> char - namely that it should invalid in spaces, and beyond that why is
> any conversion needed, since its a valid unicode character...
>

Invalid in *identifiers*





More information about the Es4-discuss mailing list