BOM inside tokens

Ash Berlin ash_es4 at firemirror.com
Tue Jul 15 10:39:31 PDT 2008


On 15 Jul 2008, at 18:22, Igor Bukanov wrote:

> The currently proposed rule for byte-order-mark (BOM) characters in
> ES4 sources is to replace them by whitespace outside of tokens. But
> what is exactly the tokens in a case like -<bom>-?
>
> AFAICS it would be treated as - - turning cases like:
>  -<bom>-a;
> into
>  - -a;
> versus
>  --a;
> that would be with current ES3 implementations.
>
> Regards, Igor
> _

Hmmm. according do UnicodeCheck app on my mac (and thus to one version  
or other of the Unicode spec) a BOM (uFEFF) is 'ZERO WIDTH NO-BREAK  
SPACE'

•	NamesList:
		= BYTE ORDER MARK (BOM), ZWNBSP
		• may be used to detect byte order by contrast with the  
noncharacter code point FFFE
		• use as an indication of non-breaking is deprecated; see 2060  
instead
		→ (zero width space - 200B)
		→ (word joiner - 2060)
		→ (<not a character> - FFFE)
•	Designated in Unicode 1.1

I'd say that a BOM should be treated just like any ordinary whitespace  
char - namely that it should invalid in spaces, and beyond that why is  
any conversion needed, since its a valid unicode character...

-ash


More information about the Es4-discuss mailing list