BOM in script sources

Lars T Hansen lth at acm.org
Tue Jan 9 06:24:14 PST 2007


Section 7.1 of E262-3 requires all format control (class Cf)  
characters to be stripped from the source before the program is  
compiled.  Opera has never done this, and is actually at fault here.   
Mea culpa.

The ECMAScript 4 committee has since concluded that the requirement  
to strip class Cf characters is a bug in the spec (people want to  
have regexes and strings containing those characters literally) and  
ECMAScript 4 will not contain that requirement.  See http:// 
developer.mozilla.org/es4/proposals/update_unicode.html.

--lars


On Jan 9, 2007, at 3:08 PM, Hallvord R. M. Steen wrote:

> Hi,
> I've come across an incompatibility between Opera and some other  
> browsers: if there is a Unicode Zero Width No-Break Space character  
> in the script source the script will not compile in Opera. This  
> character is usually known as the Unicode Byte Order Mark (BOM). If  
> it is at the start of a script file sent as UTF-8 it will be  
> removed before compilation, but if it is inside the script and not  
> within a string it will break the script.
>
> According to ECMA-262 "Any other Unicode space separator <USP>"  
> should be treated as whitespace. But apparently that only covers  
> the Zs class in Unicode, which currently consists of the following  
> code points:
>
>   0020;SPACE;Zs;0;WS;;;;;N;;;;;
>   00A0;NO-BREAK SPACE;Zs;0;CS;<noBreak> 0020;;;;N;NON-BREAKING  
> SPACE;;;;
>   1680;OGHAM SPACE MARK;Zs;0;WS;;;;;N;;;;;
>   180E;MONGOLIAN VOWEL SEPARATOR;Zs;0;WS;;;;;N;;;;;
>   2000;EN QUAD;Zs;0;WS;2002;;;;N;;;;;
>   2001;EM QUAD;Zs;0;WS;2003;;;;N;;;;;
>   2002;EN SPACE;Zs;0;WS;<compat> 0020;;;;N;;;;;
>   2003;EM SPACE;Zs;0;WS;<compat> 0020;;;;N;;;;;
>   2004;THREE-PER-EM SPACE;Zs;0;WS;<compat> 0020;;;;N;;;;;
>   2005;FOUR-PER-EM SPACE;Zs;0;WS;<compat> 0020;;;;N;;;;;
>   2006;SIX-PER-EM SPACE;Zs;0;WS;<compat> 0020;;;;N;;;;;
>   2007;FIGURE SPACE;Zs;0;WS;<noBreak> 0020;;;;N;;;;;
>   2008;PUNCTUATION SPACE;Zs;0;WS;<compat> 0020;;;;N;;;;;
>   2009;THIN SPACE;Zs;0;WS;<compat> 0020;;;;N;;;;;
>   200A;HAIR SPACE;Zs;0;WS;<compat> 0020;;;;N;;;;;
>   202F;NARROW NO-BREAK SPACE;Zs;0;CS;<noBreak> 0020;;;;N;;;;;
>   205F;MEDIUM MATHEMATICAL SPACE;Zs;0;WS;<compat> 0020;;;;N;;;;;
>   3000;IDEOGRAPHIC SPACE;Zs;0;WS;<wide> 0020;;;;N;;;;;
>
> FEFF has the class "Cf" which means "Other, format".
>
> Hence, Opera is complicant with the ECMA-262 spec in not  
> considering the U+FEFF character a "white space" character in  
> script source. Is this something Firefox would consider a bug and  
> fix, or would it be better to spec ES4 to allow the U+FEFF  
> character inside script source?
>
> -- 
> Hallvord R. M. Steen
> Core QA JavaScript tester, Opera Software
> http://www.opera.com/
> Opera - simply the best Internet experience
> _______________________________________________
> Es4-discuss mailing list
> Es4-discuss at mozilla.org
> https://mail.mozilla.org/listinfo/es4-discuss




More information about the Es4-discuss mailing list