Full Unicode strings strawman

Allen Wirfs-Brock allen at wirfs-brock.com
Mon May 16 13:22:32 PDT 2011


On May 16, 2011, at 12:28 PM, Mike Samuel wrote:

> 
> DOMString is defined at
> http://www.w3.org/TR/DOM-Level-2-Core/core.html#ID-C74D1578 thus
> 
>    Type Definition DOMString
>    A DOMString is a sequence of 16-bit units.
> 
> so how would round tripping a JS string through a DOM string work?

Because, the DOM spec. says: "Applications must encode DOMString using UTF-16 (defined in [Unicode] and Amendment 1 of [ISO/IEC 10646])." it must continue to do this.

Values return as DOM strings would (21-bit char enhanced) ES strings where each string character contained a 16-bit UTF-16 code unit.  Just like they do now. Processing of such strings would have to do explicit surrogate pair processing just like they do now.  However, such a string could be converted to a non-UTF-16 ecoded string by explicitly user code or via a new built-in function such as:
   String.UTF16Decode(aDOMStringValue)

For passing strings from ES to a DOMString we have to do the inverse conversions. If explicit decoding was done as suggested above then explicit UTF-16 encoding probably should be done. But note that the internal representation of the string is likely to know if the an actual string contains any characters with codepoints > \uffff.  It may be reasonable to assume that strings without such characters are already DOMString encoded but that stings with such characters should be automatically UTF-16 encoded when they are passed as DOMString values.

> 
> How would
> 
>    var oneSupplemental = "\U00010000";
I don't think I understand you literal notation. \U is a 32-bit character value?  I whose implementation?
>    alert(oneSupplemental.length);  //  alerts 1
I'll take your word for this
>    var utf16Encoded = encodeUTF16(oneSupplemental);
>    alert(utf16Encoded.length);  //  alerts 2
yes
>    var textNode = document.createTextNode(utf16Encoded);
>    alert(textNode.nodeValue.length);   // alerts ?
2
> Does the DOM need to represent utf16Encoded internally so that it can
> report 2 as the length on fetch of nodeValue?
However the DOM representations DOMString values internally, to conform to the DOM spec. it must act as if it is representing them using UTF-16.
> If so, how can it
> represent that for systems that use a UTF-16 internal representation
> for DOMString?
Let me know if I haven't already answered this.
> 
> 
Allen

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20110516/afbf0d73/attachment.html>


More information about the es-discuss mailing list