Full Unicode strings strawman
Mark Davis ☕
mark at macchiato.com
Wed May 18 14:54:54 PDT 2011
Yes, one of the options for the internal storage of the string class is to
use different arrays depending on the contents.
1. uint8's if all the codepoint are <=FF
2. uint16's if all the codepoint values <=FFFF
3. uint32's otherwise
That way the internal storage always corresponds directly to the code point
index, which makes random access fast. Case #3 occurs rarely, so it is ok if
it takes more storage in that case.
*— Il meglio è l’inimico del bene —*
On Wed, May 18, 2011 at 14:46, Erik Corry <erik.corry at gmail.com> wrote:
> 2011/5/17 Wes Garland <wes at page.ca>:
> > If you're already storing UTF-8 strings internally, then you are already
> > doing something "expensive" (like copying) to get their code units into
> > out of JS; so no incremental perf impact by not having a common UTF-16
> > backing store.
> >> (As a note, Gecko and WebKit both use UTF-16 internally; I would be
> >> _really_ surprised if Trident does not. No idea about Presto.)
> > FWIW - last I time I scanned the v8 sources, it appeared to use a
> > three-representation class, which could store either ASCII, UCS2, or
> > Presumably ASCII could also be ISO-Latin-1, as both are exact, naive,
> > byte-sized UCS2/UTF-16 subsets.
> V8 has ASCII strings and UCS2 strings. There are no Latin1 strings
> and UTF-8 is only used for IO, never for internal representation.
> WebKit uses UCS2 throughout and V8 is able to work directly on WebKit
> UCS2 strings that are on WebKit's C++ heap.
> I like Shawn Steele's suggestion.
> Erik Corry
> es-discuss mailing list
> es-discuss at mozilla.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the es-discuss