Non-extensibility of Typed Arrays

Filip Pizlo fpizlo at apple.com
Wed Sep 4 23:14:04 PDT 2013


On Sep 4, 2013, at 10:11 PM, K. Gadd <kg at luminance.org> wrote:

> Did anyone address what should be done in the use case where it's necessary for information to 'tag along' with an array or typed array, for interop purposes? The existence of interior binary data objects seems to complicate this further; for example I had said that it seems like WeakMap allows attaching information to a typed array in that case even if it isn't extensible. If interior objects lose identity, though, it now becomes *literally impossible* for data to follow an instance of Uint32array (or whatever) around the runtime, which is kind of troubling. Obviously I understand *why* this is the case for interior objects.
> 
> Is the meaning of an assignment to an interior object well specified? The data is copied from the source typed array into the interior object, I assume.
> 
> I'm going to describe how I understand things and from that how it seems like they could work:
> At present when you construct a typed array it is a view over a particular buffer. You can construct an array with a size `new Uint32Array(32)` in which case a buffer is allocated for you behind the scenes; you can construct an array from a buffer + offset/size pair in order to create a view over a subregion of the buffer. In both cases, the 'array' does not actually represent or contain the data, it is merely a proxy of sorts through which you can access elements of a particular type.
> It is my understanding that this is the same for binary data types: you can construct a heap instance of one, in which case it has an invisible backing buffer, or you can 'construct' one from an existing buffer+offset, in which case it is more like a proxy that represents the given data type at that given offset in the buffer, and when you manipulate the proxy you are manipulating the content of the buffer.
> 
> In both cases, I believe it is consistent that these objects are all 'views' or 'proxies', not actual data. The fact that you can create an instance directly creates the *illusion* of them being actual data but in every case it is possible for multiple instances to share the same backing store without sharing referential identity (via ===).
> 
> In both cases, I don't believe a user should expect that attaching an expando to one object instance should modify the expandos on another object instance. Given this, it seems perfectly reasonable to be able to attach expandos to a typed array, and I've previously described why this use case is relevant (interop between compilers targeting JS, and native hand-written JS, for one).
> 
> In the same sense, if typed arrays must be constructed to act as proxies for the 'interior' arrays in a binary data type, being able to attach expandos to them does not cause much harm, other than the fact that the lifetime of the expando does not match the lifetime of the underlying binary data. But this is already true for typed arrays, in a sense.
> 
> I think the best way to address the confusion of expandos on interior arrays is simply non-extensibility, as has been discussed. I don't see why non-extensibility for interior arrays requires crippling the functionality of typed arrays in general, since JS already seems to have 2-3 exposed concepts in this field (seal, freeze, preventExtensions) along with query methods to find out if those concepts apply to a given object (isSealed, isFrozen, isExtensible)
> 
> If interior arrays are not extensible, I should hope that Object.isExtensible for them returns false. If it were to return true when they have no expando support that would be incredibly confusing.
> 
> Anyway, given all this I would propose that the optimal solution (in terms of usability, at least - can't speak for the performance consequences) is for typed arrays to be extensible by default, as they are Objects that point to underlying sequences of elements, just like Array. This gives good symmetry and lets you cleanly substitute a typed array for an Array in more cases (resizability and mixed types being the big remaining differences). In cases where extensibility is a trap for the unwary or actively undesirable, like interior objects, the instance should be made non-extensible. This allows all end user code to handle cases where it is passed an interior array or object without reducing the usefulness of typed arrays.

I can sort of buy that this:

var x = struct.f; // it's an interior array

could be non-extensible.  But it all still feels a bit odd.

I think I prefer non-extensibility of typed arrays over sometimes-extensibility.

> 
> FWIW I would also argue that a free-standing instance of any Binary Data type (that you construct with new, not using an existing buffer) should maybe be extensible by default as well, even if 'interior' instances are not. However, making binary data types always non-extensible wouldn't exactly break any compatibility or use cases, since they're a new feature - but it does mean now we have to add checks for extensibility/typeof in more cases, which is awful...
> 
> (A related area where this is a big problem for me and authors of similar packages is emulating the java/C# 'getHashCode' pattern, where objects all have an associated static hash code. Implementing this often requires attaching the computed hash to the object as an expando or via some other association like WeakMap. I think interior objects in binary data break this fundamentally, which is painful.)
> 
> 
> On Wed, Sep 4, 2013 at 7:29 PM, Filip Pizlo <fpizlo at apple.com> wrote:
> 
> On Sep 4, 2013, at 5:25 PM, Brendan Eich <brendan at mozilla.com> wrote:
> 
>> Filip Pizlo wrote:
>>>>> Typed arrays have both of these properties right now and so expandability is a free lunch.
>>>> 
>>>> The last sentence makes a "for-all" assertion I don't think implementations must be constrained by. 
>>> 
>>> How so? It is true that some VM implementations will be better than others. But ultimately every VM can implement every optimization that every other VM has; in fact my impression is that this is exactly what is happening as we speak.
>> 
>> My "for-all" referred to all typed arrays across all VMs, not just all VMs.
>> 
>> Also just as a point of fact (something "done", the Latin root means "deed"), I do not see the same optimizations being used in all VMs. For example, SpiderMonkey's TI (written up here: http:// rfrn.org/~shu/drafts/ti.pdf‎ [PLDI 2012]) is not being used elsewhere AFAIK -- please correct me if I'm mistaken.
> 
> Interesting point.  Equivalent optimizations are being done.  Other VMs also infer types one way or another.  And I'd argue that my way of inferring types is the best - it incurs smaller overheads for start-up while achieving more precise results.  (Of course I must say that - I stand by my stuff, heh.)  That being said, I do think that FF's TI is really cool and loved reading that paper.
> 
> It's kind of like in JVMs, all of the big-league ones did speculative inlining - but they do it in radically different ways and rely on different kinds of feedback and if you go to a conference where JVM hackers show up, they will argue about which is best.  I have fond memories of Sun vs. IBM vs. Oracle shouting matches about how you do deoptimization, whether you do deoptimization at all, and what you need to analyze and prove things about the class hierarchy.  That doesn't change the basics: they all do speculative inlining and it performs sort of the same in the end.
> 
> I suspect that the same thing is becoming true of typed arrays, regardless of whether they are extensible or not.  I guess that when I said "every optimization that every other VM has" I didn't mean literally using the same exact algorithm - just performing optimizations that achieve equivalent results.
> 
>> 
>>> So, it doesn't make much sense to make language design decisions because it might make some implementor's life easier right now. If you could argue that something will /never/ be efficient if we add feature X, then that might be an interesting argument. But as soon as we identify one sensible optimization strategy for making something free, I would tend to think that this is sufficient to conclude that the feature is free and there is no need to constrain it. If we don't do this then we risk adding cargo-cult performance features that rapidly become obsolete.
>> 
>> I agree that's a risk. I'm also with Niko in wanting to argue about what the semantics should be without appealing to performance arguments.
> 
> Right!  I guess my first order argument is that performance *isn't* an argument in favor of non-expandability.
> 
>> 
>> However, I still think you are verging on promising a free lunch. All methods in C++ cannot affordably be virtual. Expandos in JS cost. At fine enough grain, even pretty-well-predicted branches cost. Nothing is free-enough to discount forever in my bitter and long experience :-P.
> 
> I am promising a free lunch!  Virtual methods in C++ are only expensive because C++ still doesn't have feedback-driven optimization.  JVMs make them free in Java.  And they are free.  Period.  There is no upside to marking a method final in Java.  I am arguing that expandos are similar.
> 
>> 
>>>> The lack of static types in JS does not mean exactly one implementation representation must serve for all instances of a given JS-level abstraction. We already have strings optimized variously in the top VMs, including Chords or Ropes, dependent strings, different character sets, etc.
>>>>> 
>>>>> Still find this discussion amusing? Here's the long story is: It is these things that I list above that lead to a 16 byte overhead on 32-bit, and a 32-byte overhead on 64-bit in the best "sane" case. Giving typed array objects expandability doesn't add to this overhead, because two of the fields necessary to implement the above (the type, and the buffer) can be displaced for pointing to property storage. Any imaginable attempt to reduce the overhead incurred by the information - using BBOP (big bag of pages) for the type, using an out-of-line weak map for the buffer or the type, encoding some of the bits inside the pointer to the typed array, etc. - can be also used to eradicate any space overhead you'd need for custom properties, so long as you're on board with the "free if unused, sub-optimal if you use them" philosophy.
>>>> 
>>>> For something like decimal, it matters whether there's an empty side table and large-N decimal instances of total size N*S, vs. N*(S+K) for some constant K we could eliminate by specializing harder. Even better if we agree that decimal instances should be non-extensible (and have value not reference semantics -- more below).
>>> 
>>> With a side table, the constant K = 0 even if you have custom properties. The table will only have an entry for those instances that had custom properties.
>> 
>> I know, that's why I was attacking the non-side-table approach.
>> 
>> But the side table has its own down-side trade-offs: GC complexity, even costlier indirection, and strictly greater implementation complexity. If one could implement without having to mess with this K ?= 0 design decision and hassle with packing or else using a side-table, one's VM would be smaller, simpler, less buggy -- all else equal.
> 
> Meh, I'm just reusing the GC complexity that the DOM already introduces.
> 
>> 
>> Now you may say that I'm betraying my hero Mr. Spock, whom I have invoked to argue that implementors should sacrifice so the mass of JS users can live long and prosper.
> 
> Yes, you are. ;-)
> 
>> 
>> And you'd have me dead to rights -- if I thought JS users wanted expandos on binary data, that the lack of expandos there was a problem akin to the whole starship being blown up. But I do not believe that's the case.
>> 
>> If users don't care, then implementors should get a break and VMs should be simpler, ceteris paribus.
> 
> Fair enough.
> 
>> 
>>>>> - If the VM wants to go further and create immediate representations of some or all Int64's, similarly to what VMs do for JS small integers today, then the main problem you run into is object identity: does Int64(1).add(Int64(1)) == Int64(1).add(Int64(1))? A naive JS implementation of an Int64 class would say that this is false, since it's likely to allocate a new Int64 each time. But an immediate representation would have no choice but to say true. You can work around this if you say that the VM's implementation of Int64 operations behaves /as if/ the add()/sub()/whatever() methods used a singleton cache. You can still then have custom properties; i.e. you could do Int64(2).foo = 42 and then Int64(1).add(Int64(1)).foo will return 42, since the VM can keep an immediate-int64-to-customproperties map on the side. That's kind of analogous to how you could put a setter on field '2' of Array.prototype and do some really hilarious things.
>>>> 
>>>> The value objects proposal for ES7 is live, I'm championing it. It does not use (double-dispatch for dyadic) operators as methods. It does not use extensible objects.
>>>> 
>>>> http://wiki.ecmascript.org/doku.php?id=strawman:value_objects
>>>> http://www.slideshare.net/BrendanEich/value-objects
>>>> 
>>>> Warning: both are slightly out of date, I'll be updating the strawman over the next week.
>>> 
>>> Thanks for the links! To clarify, I'm not trying to make a counterproposal - the above was nothing more than a fun thought experiment and I shared it to motivate why I think that custom properties are free.
>>> 
>>> My understanding is that you are still arguing that custom properties are not free, and that they incur some tangible cost in terms of space and/or time. I'm just trying to show you why they don't if you do the same optimizations for them that have become acceptable for a lot of other JS corners. Unless you think that ES should have an "ease of implementation" bar for features. I wouldn't necessarily mind that, but my impression is that this is not the case.
>> 
>> I do think implementor ease, or really implementation simplicity, should be a concern. It's secondary, per Spock's Kobayashi Maru solution, to the needs of the many JS users. But it's not nothing. Part of the impetus for Dart, I'm told, is the complexity of V8 required by JS-as-it-is. Whatever the case there, standardized JS extensions should not add too much complexity if we can help it.
>> 
>> I'll lay off performance concerns but you'll still see me, like Ahab lashed to the white whale, beckoning against free lunch arguments or anything near them :-P.
> 
> My job is to give people a free lunch in the performance department.  So I live by free lunch arguments.
> 
>> 
>>>> With value objects, TC39 has definitely favored something that I think you oppose, namely extending JS to have (more) objects with value not reference semantics, which requires non-extensibility.
>>> 
>>> Indeed.
>>> 
>>>> 
>>>> If I have followed your messages correctly, this is because you think non-extensibility is a rare case that should not proliferate. 
>>> 
>>> I have two points here:
>>> 
>>> - Typed arrays already have so much observable objectyness that making then non-extensible feels arbitrary; this is true regardless of the prevalence, or lack thereof, of non-extensibility.
>> 
>> Ok, I acknowledge this point.
>> 
>> And yet SpiderMonkey had native typed arrays from the get-go, non-extensible -- we didn't use WebIDL. So the interoperable intersection semantics developers can count on does not include extensibility. As Mark says, this allows us to standardize either way, so we need arguments that don't appeal to "feelings".
> 
> This is a good point.
> 
>> 
>>> - At the same time, I do think that non-extensibiltiy is a rare case and I don't like it.
>> 
>> I can tell ;-). Feelings are important but to decide on a spec we will need stronger reasons.
> 
> I agree.  I'm assuming that in the grand scheme of things, specs improve when people articulate gut feelings and we reach something in the middle.
> 
>> 
>>>> But with ES5 Object.preventExtensions, etc., the horse is out of the barn.
>>> 
>>> It's there and we have to support it, and the fact that you can do preventExtensions() to an object is a good thing. That doesn't mean it should become the cornerstone for every new feature. If a user wants to preventExtensions() on their object, then that's totally cool - and I'm not arguing that it isn't.
>>> 
>>> The argument I'm making is a different one: should an object be non-expandable by default?
>>> 
>>> I keep hearing arguments that this somehow makes typed arrays more efficient. That's like arguing that there exists a C compiler, somewhere, that becomes more efficient if you label your variables as 'register'.
>> 
>> I remember when that indeed mattered.
>> 
>>> It's true that if you're missing the well-known optimization of register allocation then yes, 'register' is an optimization. Likewise, if you're missing the well-known object model optimizations like pointer displacement, BBOP's, or other kinds of side tables, then forcing objects to be non-extensible is also an optimization. That doesn't mean that we should bake it into the language. VM hackers can just implement these well-known optimizations and just deal with it.
>> 
>> Ok, let's let the performance argument rest. You can be Ishmael and live. I'm Ahab and I still stab at such nearly-free-lunch, "sufficiently smart compiler" claims :-).
>> 
>>>> At a deeper level, the primitives wired into the language, boolean number string -- in particular number when considering int64, bignum, etc. -- can be rationalized as value objects provided we make typeof work as people want (and work so as to uphold a == b && typeof a == typeof b <=> a === b).
>>> 
>>> I think making int64/bignum be primitives is fine. My only point is that whether or not you make them expandable has got nothing to do with how much memory they use.
>>> 
>>>> 
>>>> This seems more winning in how it unifies concepts and empowers users to make more value objects, than the alternative of saying "the primitives are legacy, everything else has reference semantics" and turning a blind eye, or directing harsh and probably ineffective deprecating words, to Object.preventExtensions.
>>> 
>>> Well this is all subjective. Objects being expandable by default is a unifying concept.
>> 
>> It does not unify number, boolean, string.
> 
> True.
> 
>> 
>> What's not subjective is that we have two concepts in JS today, one (ignoring null and undefined) for primitive AKA value types, the other for reference types (objects). I see a way to extend object as a concept to subsume value types, although of course unity comes at the price of complexity for object. But non-extensibility is a piece of complexity already added to object as a concept by ES5.
>> 
>> Irreducible complexity here, and perhaps "subjective" or (I prefer) "aesthetic" judgment is the only way to pick.
> 
> Is it clear that we *can't* have a better story for value types?  I just don't think that non-extensibility is sufficient.
> 
> OK so lets back up.  Do you believe that making an object non-extensible is sufficient to make it a "value type"?  I don't.  You probably need some other stuff.
> 
> This is where I return to the objectyness point: typed arrays are already spec'd to have a bunch of heavy reference-to-object behavior.  So making then expandable is no big deal.  And making then non-expandable means that we'll now live in a weirdo world where we have four different concepts of what it means to be a value:
> 
> A) Full blown reference objects that you can do weird things to, like add properties and change __proto__, etc.  You can also make one non-extensible at your discretion, which fits into the bat-poop crazy "you can do anything" philosophy of full blown objects.  And that's great - that's the heart of the language, and I happen to enjoy it.
> 
> B) Object types that are always non-extensible but otherwise still objecty - they have a prototype that is observable, they reveal their identity via ==, and you can actually inject stuff into them by modifying the appropriate Object.prototype.
> 
> C) Values with whatever value type semantics we come up with in the future.
> 
> D) Primitives.
> 
> Now, I hope that we could get C and D to be as close as possible to each other.  But that still leaves three different behaviors.  This introduces a learning curve.  That's why (B) offends me.  It's subtly different from (A) and clearly different from either (C) or (D).
> 
> Now, we actually also have a totally alternate behavior, used by binary data.  And my argument there is that I wouldn't get too offended by binary data acting weird, because the very notion of exposing binary data is weird to begin with.  I expect it to be used only for special graphicsy stuff and not for general-purpose "value types" for normal JS programs.  So it's OK to me if binary data is both weird and inconsistent with everything else.  And no, I still don't view "typed arrays" as being part of binary data - it already appears to be the case that typed arrays have different buffer behavior to the struct types.  So they're just different.  And that's fine.
> 
>> 
>>> The only thing that expandability of typed arrays appears to change is the interaction with binary data - but that isn't exactly a value object system as much as it is a playing-with-bits system. I'm not sure that having oddities there changes much.
>> 
>> Sure, let's get back to binary data (I brought up value objects because you brought up int64).
>> 
>> Interior binary data objects will be cons'ed up upon extraction, so distinguishable by == returning false and by lack of expando preservation. Niko, Dmitry, and others take this as a sign that expandos should not be allowed, leaving only == returning false among same-named extractions as an oddity. And they further conclude that expandos should not be allowed on any binary data object (whether interior extracted, or not).
>> 
>> You argue on the contrary that JS objects in general can be extended with expandos, so why restrict binary data objects, even interior ones that are extracted? Let each such extracted interior object be != with all other same-named extractions, and let each have expandos assigned that (vacuously) won't be preserved on next extraction.
>> 
>> I hope I have stated positions accurately.
> 
> Yup!
> 
>> If so I'll tag out of the ring, in hopes of someone else bringing new arguments to bear.
>> 
>> /be
> 
> 
> _______________________________________________
> es-discuss mailing list
> es-discuss at mozilla.org
> https://mail.mozilla.org/listinfo/es-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20130904/c6405a2f/attachment-0001.html>


More information about the es-discuss mailing list