Full Unicode strings strawman
erik.corry at gmail.com
Wed May 18 14:46:24 PDT 2011
2011/5/17 Wes Garland <wes at page.ca>:
> If you're already storing UTF-8 strings internally, then you are already
> doing something "expensive" (like copying) to get their code units into and
> out of JS; so no incremental perf impact by not having a common UTF-16
> backing store.
>> (As a note, Gecko and WebKit both use UTF-16 internally; I would be
>> _really_ surprised if Trident does not. No idea about Presto.)
> FWIW - last I time I scanned the v8 sources, it appeared to use a
> three-representation class, which could store either ASCII, UCS2, or UTF-8.
> Presumably ASCII could also be ISO-Latin-1, as both are exact, naive,
> byte-sized UCS2/UTF-16 subsets.
V8 has ASCII strings and UCS2 strings. There are no Latin1 strings
and UTF-8 is only used for IO, never for internal representation.
WebKit uses UCS2 throughout and V8 is able to work directly on WebKit
UCS2 strings that are on WebKit's C++ heap.
I like Shawn Steele's suggestion.
More information about the es-discuss