Non-extensibility of Typed Arrays
fpizlo at apple.com
Wed Sep 4 16:15:15 PDT 2013
On Sep 4, 2013, at 3:09 PM, Brendan Eich <brendan at mozilla.com> wrote:
>> Filip Pizlo <mailto:fpizlo at apple.com>
>> September 4, 2013 12:34 PM
>> My point is that having custom properties, or not, doesn't change the overhead for the existing typed array spec and hence has no effect on small arrays. The reasons for this include:
>> - Typed arrays already have to be objects, and hence have a well-defined behavior on '=='.
>> - Typed arrays already have to be able to tell you that they are in fact typed arrays, since JS doesn't have static typing.
>> - Typed arrays already have prototypes, and those are observable regardless of expandability. A typed array from one global object will have a different prototype than a typed array from a different global object. Or am I misunderstanding the spec?
>> - Typed arrays already have to know about their buffer.
>> - Typed arrays already have to know about their offset into the buffer. Or, more likely, they have to have a second pointer that points directly at the base from which they are indexed.
>> - Typed arrays already have to know their length.
>> You're not proposing changing these aspects of typed arrays, right?
> Of course not, but for very small fixed length arrays whose .buffer is never accessed, an implementation might optimize harder.
As I said, of course you can do this, and one way you could "try harder" is to put the buffer pointer in a side table. The side table maps array object pointers to their buffers, and you only make an entry in this table if .buffer is mentioned.
But if we believe that this is a sensible thing for a VM to do - and of course it is! - then the same thing can be done for the custom property storage pointer.
> It's hard for me to say "no, Filip's analysis shows that's never worthwhile, for all time."
>> The super short message is this: so long as an object obeys object identity on '==' then you can have "free if unused, suboptimal if you use them" custom properties by using a weak map on the side. This is true of typed arrays and it would be true of any other object that does object-style ==. If you allocate such an object and never add a custom property then the weak map will never have an entry for it; but if you put custom properties in the object then the map will have things in it. But with typed arrays you can do even better as my previous message suggests: so long as an object has a seldom-touched field and you're willing to eat an extra indirection or an extra branch on that field, you can have "free if unused, still pretty good if you use them" custom properties by displacing that field. Typed arrays have both of these properties right now and so expandability is a free lunch.
> The last sentence makes a "for-all" assertion I don't think implementations must be constrained by.
How so? It is true that some VM implementations will be better than others. But ultimately every VM can implement every optimization that every other VM has; in fact my impression is that this is exactly what is happening as we speak.
So, it doesn't make much sense to make language design decisions because it might make some implementor's life easier right now. If you could argue that something will never be efficient if we add feature X, then that might be an interesting argument. But as soon as we identify one sensible optimization strategy for making something free, I would tend to think that this is sufficient to conclude that the feature is free and there is no need to constrain it. If we don't do this then we risk adding cargo-cult performance features that rapidly become obsolete.
> Small fixed-length arrays whose .buffer is never accessed (which an implementation might be able to prove by type inference) could be optimized harder.
And my point is that if you do so, then the same technique can be trivially applied to the custom property storage pointer.
> The lack of static types in JS does not mean exactly one implementation representation must serve for all instances of a given JS-level abstraction. We already have strings optimized variously in the top VMs, including Chords or Ropes, dependent strings, different character sets, etc.
>> Still find this discussion amusing? Here's the long story is: It is these things that I list above that lead to a 16 byte overhead on 32-bit, and a 32-byte overhead on 64-bit in the best "sane" case. Giving typed array objects expandability doesn't add to this overhead, because two of the fields necessary to implement the above (the type, and the buffer) can be displaced for pointing to property storage. Any imaginable attempt to reduce the overhead incurred by the information - using BBOP (big bag of pages) for the type, using an out-of-line weak map for the buffer or the type, encoding some of the bits inside the pointer to the typed array, etc. - can be also used to eradicate any space overhead you'd need for custom properties, so long as you're on board with the "free if unused, sub-optimal if you use them" philosophy.
> For something like decimal, it matters whether there's an empty side table and large-N decimal instances of total size N*S, vs. N*(S+K) for some constant K we could eliminate by specializing harder. Even better if we agree that decimal instances should be non-extensible (and have value not reference semantics -- more below).
With a side table, the constant K = 0 even if you have custom properties. The table will only have an entry for those instances that had custom properties.
>> - If the VM wants to go further and create immediate representations of some or all Int64's, similarly to what VMs do for JS small integers today, then the main problem you run into is object identity: does Int64(1).add(Int64(1)) == Int64(1).add(Int64(1))? A naive JS implementation of an Int64 class would say that this is false, since it's likely to allocate a new Int64 each time. But an immediate representation would have no choice but to say true. You can work around this if you say that the VM's implementation of Int64 operations behaves /as if/ the add()/sub()/whatever() methods used a singleton cache. You can still then have custom properties; i.e. you could do Int64(2).foo = 42 and then Int64(1).add(Int64(1)).foo will return 42, since the VM can keep an immediate-int64-to-customproperties map on the side. That's kind of analogous to how you could put a setter on field '2' of Array.prototype and do some really hilarious things.
> The value objects proposal for ES7 is live, I'm championing it. It does not use (double-dispatch for dyadic) operators as methods. It does not use extensible objects.
> Warning: both are slightly out of date, I'll be updating the strawman over the next week.
Thanks for the links! To clarify, I'm not trying to make a counterproposal - the above was nothing more than a fun thought experiment and I shared it to motivate why I think that custom properties are free.
My understanding is that you are still arguing that custom properties are not free, and that they incur some tangible cost in terms of space and/or time. I'm just trying to show you why they don't if you do the same optimizations for them that have become acceptable for a lot of other JS corners. Unless you think that ES should have an "ease of implementation" bar for features. I wouldn't necessarily mind that, but my impression is that this is not the case.
> With value objects, TC39 has definitely favored something that I think you oppose, namely extending JS to have (more) objects with value not reference semantics, which requires non-extensibility.
> If I have followed your messages correctly, this is because you think non-extensibility is a rare case that should not proliferate.
I have two points here:
- Typed arrays already have so much observable objectyness that making then non-extensible feels arbitrary; this is true regardless of the prevalence, or lack thereof, of non-extensibility.
- At the same time, I do think that non-extensibiltiy is a rare case and I don't like it.
> But with ES5 Object.preventExtensions, etc., the horse is out of the barn.
It's there and we have to support it, and the fact that you can do preventExtensions() to an object is a good thing. That doesn't mean it should become the cornerstone for every new feature. If a user wants to preventExtensions() on their object, then that's totally cool - and I'm not arguing that it isn't.
The argument I'm making is a different one: should an object be non-expandable by default?
I keep hearing arguments that this somehow makes typed arrays more efficient. That's like arguing that there exists a C compiler, somewhere, that becomes more efficient if you label your variables as 'register'. It's true that if you're missing the well-known optimization of register allocation then yes, 'register' is an optimization. Likewise, if you're missing the well-known object model optimizations like pointer displacement, BBOP's, or other kinds of side tables, then forcing objects to be non-extensible is also an optimization. That doesn't mean that we should bake it into the language. VM hackers can just implement these well-known optimizations and just deal with it.
> At a deeper level, the primitives wired into the language, boolean number string -- in particular number when considering int64, bignum, etc. -- can be rationalized as value objects provided we make typeof work as people want (and work so as to uphold a == b && typeof a == typeof b <=> a === b).
I think making int64/bignum be primitives is fine. My only point is that whether or not you make them expandable has got nothing to do with how much memory they use.
> This seems more winning in how it unifies concepts and empowers users to make more value objects, than the alternative of saying "the primitives are legacy, everything else has reference semantics" and turning a blind eye, or directing harsh and probably ineffective deprecating words, to Object.preventExtensions.
Well this is all subjective. Objects being expandable by default is a unifying concept. The only thing that expandability of typed arrays appears to change is the interaction with binary data - but that isn't exactly a value object system as much as it is a playing-with-bits system. I'm not sure that having oddities there changes much.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the es-discuss