Full Unicode strings strawman

Erik Corry erik.corry at gmail.com
Wed May 18 14:46:24 PDT 2011


2011/5/17 Wes Garland <wes at page.ca>:
> If you're already storing UTF-8 strings internally, then you are already
> doing something "expensive" (like copying) to get their code units into and
> out of JS; so no incremental perf impact by not having a common UTF-16
> backing store.
>
>>
>> (As a note, Gecko and WebKit both use UTF-16 internally; I would be
>> _really_ surprised if Trident does not.  No idea about Presto.)
>
> FWIW - last I time I scanned the v8 sources, it appeared to use a
> three-representation class, which could store either ASCII, UCS2, or UTF-8.
> Presumably ASCII could also be ISO-Latin-1, as both are exact, naive,
> byte-sized UCS2/UTF-16 subsets.

V8 has ASCII strings and UCS2 strings.  There are no Latin1 strings
and UTF-8 is only used for IO, never for internal representation.
WebKit uses UCS2 throughout and V8 is able to work directly on WebKit
UCS2 strings that are on WebKit's C++ heap.

I like Shawn Steele's suggestion.

-- 
Erik Corry


More information about the es-discuss mailing list