encapsulated hash codes
Allen.Wirfs-Brock at microsoft.com
Wed Nov 4 18:43:33 PST 2009
From: Maciej Stachowiak [mailto:mjs at apple.com]
Sent: Wednesday, November 04, 2009 4:50 PM
To: Allen Wirfs-Brock
Subject: Re: encapsulated hash codes
On Nov 4, 2009, at 2:49 PM, Allen Wirfs-Brock wrote:
A straw man proposal for Harmony encapsulated hash codes has been posted to the Wiki at.<http://wiki.ecmascript.org/doku.php?id=strawman:encapsulated_hashcodes>
1) Wouldn't it be simpler to have a single Object.hash() function that returns a uniformly distributed number in the 0..2^32 range, and leave the mod operation up to code using the hash code? Making a new function that (essentially) encapsulates the hash operation and the mod does not seem to pull its weight in API complexity.
Yes, it would and that is the classic approach (perhaps excluding the uniformity of distribution) used by most languages. As an implementer I would be fairly happy with that approach. However, some of our members have significant concerns about the possibility of hash codes being used as a covert communications canner. This proposal tries to avoid that problem by giving each client of identity hashes (most likely each hash table) their own hash function which yields a distinct sequences of hash values (for the same sequence of objects). Since the functions aren't shared the sequence of values they generate can be used as a channel.
Requiring uniform distribution over a 32-bit range may not be practical. If a client only wants hashes in the 0.255 range the obvious function of a 32-bit value (mod 256) ignores most of the bits that actually distinguish the values.
The individual functions probably do a bit more than a simple mod. Typically, the seed that is used to generate per object hash values (for example, the allocation address of an object) has fewer than 32 variable bits (for example, only the size range of the allocation nursery). If the hash function know range of hash values it is expected to generate it can more effectively use the actual variable bits to generate a good distribution.
2) What about non-objects? It would be nice if strings, numbers, and perhaps other primitive types could hash by value rather than by identity, so you could reasonably mix them as hash keys with objects.
Sure this can be added, but isn't an essential primitive. Hash values can be easily generated today for all the built-in value types. But object identity hash requires some sort of primitive support.
2) Security considerations: it should be recommended that the hash code, if computed from the address of the object, is not reversible. (Of course, with a copying collector you probably can't compute the hash code from the address).
Yes, that would be a good requirement. In practice, it's usually the case anyway. Actually, the address is a good seed hash value even for copying collectors. There are tricks that are known by lisp/smalltalk/etc. implementers for efficiently doing this. Happy to share... Ideally, identity hashes cost nothing to compute and require no additional per object storage. The ideal isn't obtainable, but you can get pretty close.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the es-discuss