Non-extensibility of Typed Arrays

Filip Pizlo fpizlo at apple.com
Fri Aug 30 10:41:02 PDT 2013


On Aug 30, 2013, at 9:28 AM, Brendan Eich <brendan at mozilla.com> wrote:

> Hi,
>> Filip Pizlo <mailto:fpizlo at apple.com>
>> August 28, 2013 11:01 PM
>> Here's the part that gets me, though: what is the value of disallowing named properties on typed arrays?  Who does this help?
> 
> You've heard about symmetry with struct types (ES6), right? Those do not want expandos. We could break symmetry but at some cost. Too small to worry about? Outweighed by benefits?

It's a fair point.  I don't see where it would break semantics but I'll try to do a thought experiment to see if it makes things confusing or inconvenient to the programmer.  Whether or not I care depends on the answers to the following questions:

1) Is the purpose to simplify programming by allowing you to add static typing?
2) Are we trying to help JITs?
3) Do we just want a sensible way of mapping to binary data?  (For both DOM and C-to-JS compilers)

It appears that (1) is a non-goal; if it was a goal then we'd have a different aliasing story, we wouldn't have the byteOffset/byteLength/buffer properties, and there would be zero discussion of binary layout.  We'd also bake the types deeper into the language.  This doesn't simplify programming if you have to write code in a bifurcated world with both traditional JS objects (all dynamic, objects can point at each other, but the backing stores of objects don't alias each other) and binary objects (have some types to describe layout, but can't have arbitrary object graphs, and backing stores of distinct objects may alias each other).

(2) appears to be a bit more of a pie-in-the-sky dream than a goal.  A decent JIT will already recognize idioms where the programmer created an object with a clear sequence of fields and then uses that object in a monomorphic way.  Both 'function Constructor() { this.a = ...; this.b = ...; }' and '{a:..., b:...}' will get recognized, though some combination of run-time and compile-time analysis, as indicating that the user intends to have a type that has 'a' and 'b' as fields.  It's true that binary data makes this explicit, but the JIT can fall apart in the same way as it does already for normal objects: the references to these objects tend to be untyped so the programmer can inadvertently introduce polymorphism and lose some (most?) of the benefits.  Because binary data objects will have potentially aliased backing stores, you get the additional problem that you can't do any field-based aliasing analysis: for a normal JS object if I know that 'o.a' accesses own-property 'a' and it's not a getter/setter; and 'o.b' accesses own-property 'b' and it's not a getter/setter - then I know that these two accesses don't alias.  For binary data, I don't quite have such a guarantee: 'a' can overlap 'b' in some other object.  Also, the fact that a struct type instance might have to know about a buffer along with an offset into that buffer introduces a greater object size overhead than plain JS objects.  A plain JS object needs roughly two pieces of overhead: something to identify the type and a pointer reserved for when you store more things into it.  A struct type instance will need roughly three pieces of overhead: something to identify the type, a pointer to the buffer, and some indication of the offset within that buffer.  The only performance win from struct types is probably that it gives you an explicit tuple flattening.  That's kind of cool but I remember that C# had struct types while Java didn't and yet JVMs still killed .NET on any meaningful measure of performance.

So it appears that the most realistic goal is (3).  In that case, I can't imagine a case where arrays being expandos but struct types being totally frozen will make the task of struct mapping to native code any harder.  If you're a programmer who doesn't want a typed array to have custom properties, then you won't give it custom properties - simple as that.  No need to enforce the invariant.

> 
> Sfink's point about structured clone is good, except he wrote "structured clone" and then angels cried... tears of blood.
>> 
>> I don't quite buy that this helps users; most of the objects in your program are going to allow custom properties to be added at any point.  That's kind of the whole point of programming in a dynamic language.  So having one type where it's disallowed doesn't help to clarify thinking.
> 
> There are other such types a-coming :-).

And I'll be grumpy about some of those, too. ;-)

>> 
>> I also don't buy that it makes anything more efficient.  We only incur overhead from named properties if you actually add named properties to a typed array, and in that case we incur roughly the overhead you'd expect (those named properties are a touch slower than named properties on normal objects, and you obviously need to allocate some extra space to store those named properties).
>> 
> 
> Honest q: couldn't you squeeze one more word out if JSC typed arrays were non-extensible?

I'd love to hear about this from the SM and V8 peeps.  Here's my take.  A typed array *must* know about the following bits of information:

T: Its own type.
B: A base pointer (not the buffer but the thing you index off of).
L: Its length.

But that only works if it owns its buffer - that is it was allocated using for example "new Int8Array(100)" and you never used the .buffer property.  So in practice you also need:

R: Reserved space for a pointer to a buffer.

Now observe that 'R' can be reused for either a buffer pointer or a pointer to overflow storage for named properties.  If you have both a buffer and overflow storage, you can save room in the overflow storage for the buffer pointer (i.e. displace the buffer pointer into the property storage).  We play a slightly less ambitious trick, where R either points to overflow storage or NULL.  Most typed arrays don't have a .buffer, but once they get one, we allocate overflow storage and reserve a slow in there for the buffer pointer.  So you pay *one more* word of overhead for typed arrays with buffers even if they don't have named properties.  I think that's probably good enough - I mean, in that case, you have a freaking buffer object as well so you're not exactly conserving memory.

But, using R as a direct pointer to the buffer would be a simple hack if we really felt like saving one word when you also already have a separate buffer object.

I could sort of imagine going further and using T as a displaced pointer and saving an extra word, but that might make type checks more expensive, sometimes.

So lets do the math, on both 32-bit and 64-bit (where 64-bit implies 64-bit pointers), to see how big this would be.

32-bit:

T = 4 bytes, B = 4 bytes, L = 4 bytes, R = 4 bytes.  So, you get 16 bytes of overhead for most typed arrays, and 20 if you need to use R as an overflow storage pointer and displace the buffer pointer into the overflow storage.

64-bit:

T = 8 bytes, B = 8 bytes, L = 4 bytes, R = 8 bytes.  This implies you have 4 bytes to spare if you want objects 8-byte aligned (we do); we use this for some extra bookkeeping.  So you get 32 bytes of overhead for most typed arrays, and 40 if you need to use R as an overflow storage pointer and displace the buffer pointer into the overflow storage.

As far as I can tell, this object model compresses typed arrays about as much as they could be compressed while also allowing them to be extensible.  The downside is that you pay a small penalty for typed arrays that have an "active" buffer, in the sense that you either accessed the .buffer property or you constructed the typed array using a constructor that takes a buffer as an argument.

-Filip


> 
> /be
> 
>> -Filip
>> 
>> 
>> 
>> 
>> 
>> _______________________________________________
>> es-discuss mailing list
>> es-discuss at mozilla.org
>> https://mail.mozilla.org/listinfo/es-discuss
>> Oliver Hunt <mailto:oliver at apple.com>
>> August 27, 2013 9:35 AM
>> Existing types with magic index properties (other than Array) just drop numeric expandos on the floor so it's logically a no-op. Unless there was a numeric accessor on the prototype (which non-extensibility does not save you from).
>> 
>> My complaint is that this appears to be removing functionality that has been present in the majority of shipping TA implementations, assuming from LH's comment that Chakra supports expandos.
>> 
>> --Oliver
>> 
>> 
>> _______________________________________________
>> es-discuss mailing list
>> es-discuss at mozilla.org
>> https://mail.mozilla.org/listinfo/es-discuss
>> 
>> Domenic Denicola <mailto:domenic at domenicdenicola.com>
>> August 27, 2013 9:26 AM
>> I am not aware of all the nuances of the discussion, but as a developer I would find the behavior for numeric expandos confusing. For a typed array of length 1024, setting `ta[1023]` would do something completely different from setting `ta[1024]`. Unlike normal arrays, setting `ta[1024]` would not change `ta.length`, and presumably `ta[1024]` would not be exposed by the various iteration facilities.
>> 
>> I would much rather received a loud error (in strict mode), which will either alert me to my code being weird, or possibly to my code committing an off-by-one error.
>> _______________________________________________
>> es-discuss mailing list
>> es-discuss at mozilla.org
>> https://mail.mozilla.org/listinfo/es-discuss
>> 
>> Oliver Hunt <mailto:oliver at apple.com>
>> August 27, 2013 9:18 AM
>> The curent argument for non-extensibility seems to be mozilla doesn't support them.  It sounds like all other engines do.
>> 
>> There are plenty of reasons developers may want expandos - they're generally useful for holding different kinds of metadata.  By requiring a separate object to hold that information we're merely making a developer's life harder.  This is also inconsistent with all other magically-indexable types in ES and the DOM.
>> 
>> I'm also not sure what the performance gains of inextensibility are, if DH could expand it would be greatly appreciated.
>> 
>> --Oliver
>> 
>> 
>> 
>> _______________________________________________
>> es-discuss mailing list
>> es-discuss at mozilla.org
>> https://mail.mozilla.org/listinfo/es-discuss
>> Allen Wirfs-Brock <mailto:allen at wirfs-brock.com>
>> August 27, 2013 9:04 AM
>> see meeting notes https://github.com/rwaldron/tc39-notes/blob/master/es6/2013-07/july-24.md#54-are-typedarray-insances-born-non-extensible 
>> 
>> 
>> 
>> _______________________________________________
>> es-discuss mailing list
>> es-discuss at mozilla.org
>> https://mail.mozilla.org/listinfo/es-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20130830/e0f2cddb/attachment-0001.html>


More information about the es-discuss mailing list