BOM inside tokens
Ash Berlin
ash_es4 at firemirror.com
Tue Jul 15 10:39:31 PDT 2008
On 15 Jul 2008, at 18:22, Igor Bukanov wrote:
> The currently proposed rule for byte-order-mark (BOM) characters in
> ES4 sources is to replace them by whitespace outside of tokens. But
> what is exactly the tokens in a case like -<bom>-?
>
> AFAICS it would be treated as - - turning cases like:
> -<bom>-a;
> into
> - -a;
> versus
> --a;
> that would be with current ES3 implementations.
>
> Regards, Igor
> _
Hmmm. according do UnicodeCheck app on my mac (and thus to one version
or other of the Unicode spec) a BOM (uFEFF) is 'ZERO WIDTH NO-BREAK
SPACE'
• NamesList:
= BYTE ORDER MARK (BOM), ZWNBSP
• may be used to detect byte order by contrast with the
noncharacter code point FFFE
• use as an indication of non-breaking is deprecated; see 2060
instead
→ (zero width space - 200B)
→ (word joiner - 2060)
→ (<not a character> - FFFE)
• Designated in Unicode 1.1
I'd say that a BOM should be treated just like any ordinary whitespace
char - namely that it should invalid in spaces, and beyond that why is
any conversion needed, since its a valid unicode character...
-ash
More information about the Es4-discuss
mailing list