TT strings: implementation questions

Michael Daumling mdaeumli at adobe.com
Tue Jun 10 23:49:54 PDT 2008


Hi all,

I've implemented the core string code, and now, I am facing its
embedding into TT. This, of course, raises a ton of questions, so get
prepared for the first ones...

1) I am seeing this comment in StringObject.h:

// unfortunately, memcmp isn't guaranteed to return the actual
difference between the final bytes (as required
// by localeCompare), only -1/0/1, and the MSVC implementation seems to
do the latter.... Sigh

Why does localeCompare() have this requirement? ECMA-262 does not
mention it.

2) It appears that SymTable and SymTableKey use the ABC image
(PoolObject), so the code is UTF-8 only. Is that correct? Or does
anything else use SymTable and SymTableKey?

3) Earlier on, we discussed to have a version that uses 16-bit strings
only. The current version supports 8, 16, and 32 bits, because the
overhead is minimal IMHO - often, this is just an additional switch()
statement. This allows the direct usage of ABC image data, and makes
better use of memory. It slows down string comparisons a bit, because I
cannot always use memcmp(), but see my question #1. Is that OK?

Strategy:

I will need help and guidance about the integration strategy when I am
done with the initial implementation. I expect a local TT version with
the new strings integrated up and running by the end of June. I suggest
that I leave as much code untouched as possible in the first round, and
just replace the string core code, with additional UTF-8 wrappers when
necessary. The result is, of course, that the new string will probably
not increase performance in the first step, but that performance will
increase over time as unnecessary UTF-8 conversions are removed from TT.

This is such an example (from XMLClass.cpp):

int32_t len;
{
	StringDataUTF8 utf8(tag.text);
	len = utf8.lenbytes();
}
if (len < 32)
{...}

This is very costly in my current implementation, since the
StringDataUTF8 class needs to create and encode an UTF-8 string (in case
tag.text is wider than 8 bits) just to get the length in bytes, but it
is evident that this will be much faster:

If (tag.text->getLen() < 32)
{...}

I do not necessarily want to fill this mailing list with implementation
questions, so if anyone wants to step forward and be my personal mentor,
please let me know!

Michael


More information about the Tamarin-devel mailing list