Full Unicode strings strawman
allen at wirfs-brock.com
Mon May 16 13:22:32 PDT 2011
On May 16, 2011, at 12:28 PM, Mike Samuel wrote:
> DOMString is defined at
> http://www.w3.org/TR/DOM-Level-2-Core/core.html#ID-C74D1578 thus
> Type Definition DOMString
> A DOMString is a sequence of 16-bit units.
> so how would round tripping a JS string through a DOM string work?
Because, the DOM spec. says: "Applications must encode DOMString using UTF-16 (defined in [Unicode] and Amendment 1 of [ISO/IEC 10646])." it must continue to do this.
Values return as DOM strings would (21-bit char enhanced) ES strings where each string character contained a 16-bit UTF-16 code unit. Just like they do now. Processing of such strings would have to do explicit surrogate pair processing just like they do now. However, such a string could be converted to a non-UTF-16 ecoded string by explicitly user code or via a new built-in function such as:
For passing strings from ES to a DOMString we have to do the inverse conversions. If explicit decoding was done as suggested above then explicit UTF-16 encoding probably should be done. But note that the internal representation of the string is likely to know if the an actual string contains any characters with codepoints > \uffff. It may be reasonable to assume that strings without such characters are already DOMString encoded but that stings with such characters should be automatically UTF-16 encoded when they are passed as DOMString values.
> How would
> var oneSupplemental = "\U00010000";
I don't think I understand you literal notation. \U is a 32-bit character value? I whose implementation?
> alert(oneSupplemental.length); // alerts 1
I'll take your word for this
> var utf16Encoded = encodeUTF16(oneSupplemental);
> alert(utf16Encoded.length); // alerts 2
> var textNode = document.createTextNode(utf16Encoded);
> alert(textNode.nodeValue.length); // alerts ?
> Does the DOM need to represent utf16Encoded internally so that it can
> report 2 as the length on fetch of nodeValue?
However the DOM representations DOMString values internally, to conform to the DOM spec. it must act as if it is representing them using UTF-16.
> If so, how can it
> represent that for systems that use a UTF-16 internal representation
> for DOMString?
Let me know if I haven't already answered this.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the es-discuss