TT strings: implementation questions

Edwin Smith edwsmith at adobe.com
Wed Jun 11 07:31:36 PDT 2008


> Why does localeCompare() have this requirement? ECMA-262 does not
> mention it.

This is not a requirement of ECMA-262, I remember testing spidermonkey
and comapre results are not always -1/0/1.  However, if youre
subtracting
characters, the sign on the result is only valid if there is no
wraparound.
something to remember for 32bit characters.

> 2) It appears that SymTable and SymTableKey use the ABC image
> (PoolObject), so the code is UTF-8 only. Is that correct? Or does
> anything else use SymTable and SymTableKey?

I think that's it, grep would tell for sure.  The idea is for symbol
tables to be composed of lightweight string-like objects that refer
back to the cpool.  There isn't any requirement for them to become
full-fledged strings when living in the symbol table, but traversing
the symbol table may require generating short-lived strings from the
keys.  that case would require utf8 parsing.  if the cpool changes to
some other format this should still work.

Embedders will still want the ability to strip out code overhead from
the 8/32 cases, I don't see it as a reason to hold up integration.

I nominate Steven Johnson to be a mentor, he's the author of most of
that utf8 code and is elbow deep in player integration which includes
touching the player string goo.  any seconds?

Ed

> -----Original Message-----
> From: tamarin-devel-bounces at mozilla.org [mailto:tamarin-devel-
> bounces at mozilla.org] On Behalf Of Michael Daumling
> Sent: Wednesday, June 11, 2008 2:50 AM
> To: tamarin-devel at mozilla.org
> Subject: TT strings: implementation questions
> 
> Hi all,
> 
> I've implemented the core string code, and now, I am facing its
> embedding into TT. This, of course, raises a ton of questions, so get
> prepared for the first ones...
> 
> 1) I am seeing this comment in StringObject.h:
> 
> // unfortunately, memcmp isn't guaranteed to return the actual
> difference between the final bytes (as required
> // by localeCompare), only -1/0/1, and the MSVC implementation seems
to
> do the latter.... Sigh
> 
> Why does localeCompare() have this requirement? ECMA-262 does not
> mention it.
> 
> 2) It appears that SymTable and SymTableKey use the ABC image
> (PoolObject), so the code is UTF-8 only. Is that correct? Or does
> anything else use SymTable and SymTableKey?
> 
> 3) Earlier on, we discussed to have a version that uses 16-bit strings
> only. The current version supports 8, 16, and 32 bits, because the
> overhead is minimal IMHO - often, this is just an additional switch()
> statement. This allows the direct usage of ABC image data, and makes
> better use of memory. It slows down string comparisons a bit, because
I
> cannot always use memcmp(), but see my question #1. Is that OK?
> 
> Strategy:
> 
> I will need help and guidance about the integration strategy when I am
> done with the initial implementation. I expect a local TT version with
> the new strings integrated up and running by the end of June. I
suggest
> that I leave as much code untouched as possible in the first round,
and
> just replace the string core code, with additional UTF-8 wrappers when
> necessary. The result is, of course, that the new string will probably
> not increase performance in the first step, but that performance will
> increase over time as unnecessary UTF-8 conversions are removed from
TT.
> 
> This is such an example (from XMLClass.cpp):
> 
> int32_t len;
> {
> 	StringDataUTF8 utf8(tag.text);
> 	len = utf8.lenbytes();
> }
> if (len < 32)
> {...}
> 
> This is very costly in my current implementation, since the
> StringDataUTF8 class needs to create and encode an UTF-8 string (in
case
> tag.text is wider than 8 bits) just to get the length in bytes, but it
> is evident that this will be much faster:
> 
> If (tag.text->getLen() < 32)
> {...}
> 
> I do not necessarily want to fill this mailing list with
implementation
> questions, so if anyone wants to step forward and be my personal
mentor,
> please let me know!
> 
> Michael
> _______________________________________________
> Tamarin-devel mailing list
> Tamarin-devel at mozilla.org
> https://mail.mozilla.org/listinfo/tamarin-devel


More information about the Tamarin-devel mailing list