TT strings: implementation questions

Steven Johnson stejohns at
Wed Jun 11 08:56:45 PDT 2008

(1) I inserted that comment... the spec for localeCompare doesn't in fact
require this behavior, the the Tamarin sanity/acceptance tests did. IMHO we
should just use memcmp unless SpiderMonkey has such a de facto compatibility

(2) Yep. ABC is UTF8-only, without null-termination.

(3) Sounds like a good tradeoff to me.

Re: implementation questions, since I am the one responsible for the current
disaster that is String in TT, I'll be happy to take your questions off-list

Re: the code snippet you used in XMLClass, IIRC that's just an optimization
for "short" strings -- the idea is that many of the names in XML are
re-used, so intern the "small" ones to save memory. As such, there's nothing
magic about 32, so any fast way to decide that a string is "suitably short"
would probably be equally useful.

On 6/10/08 11:49 PM, "Michael Daumling" <mdaeumli at> wrote:

> Hi all,
> I've implemented the core string code, and now, I am facing its
> embedding into TT. This, of course, raises a ton of questions, so get
> prepared for the first ones...
> 1) I am seeing this comment in StringObject.h:
> // unfortunately, memcmp isn't guaranteed to return the actual
> difference between the final bytes (as required
> // by localeCompare), only -1/0/1, and the MSVC implementation seems to
> do the latter.... Sigh
> Why does localeCompare() have this requirement? ECMA-262 does not
> mention it.
> 2) It appears that SymTable and SymTableKey use the ABC image
> (PoolObject), so the code is UTF-8 only. Is that correct? Or does
> anything else use SymTable and SymTableKey?
> 3) Earlier on, we discussed to have a version that uses 16-bit strings
> only. The current version supports 8, 16, and 32 bits, because the
> overhead is minimal IMHO - often, this is just an additional switch()
> statement. This allows the direct usage of ABC image data, and makes
> better use of memory. It slows down string comparisons a bit, because I
> cannot always use memcmp(), but see my question #1. Is that OK?
> Strategy:
> I will need help and guidance about the integration strategy when I am
> done with the initial implementation. I expect a local TT version with
> the new strings integrated up and running by the end of June. I suggest
> that I leave as much code untouched as possible in the first round, and
> just replace the string core code, with additional UTF-8 wrappers when
> necessary. The result is, of course, that the new string will probably
> not increase performance in the first step, but that performance will
> increase over time as unnecessary UTF-8 conversions are removed from TT.
> This is such an example (from XMLClass.cpp):
> int32_t len;
> {
> StringDataUTF8 utf8(tag.text);
> len = utf8.lenbytes();
> }
> if (len < 32)
> {...}
> This is very costly in my current implementation, since the
> StringDataUTF8 class needs to create and encode an UTF-8 string (in case
> tag.text is wider than 8 bits) just to get the length in bytes, but it
> is evident that this will be much faster:
> If (tag.text->getLen() < 32)
> {...}
> I do not necessarily want to fill this mailing list with implementation
> questions, so if anyone wants to step forward and be my personal mentor,
> please let me know!
> Michael
> _______________________________________________
> Tamarin-devel mailing list
> Tamarin-devel at

More information about the Tamarin-devel mailing list