What is an array?

Allen Wirfs-Brock allen at wirfs-brock.com
Wed Mar 30 16:37:21 PDT 2011


That wiki page and the working document it links to was intended to stir up exactly these sorts of questions.  There are recurring discussions about various situations where it may be useful to categorize  ECMAScript objects and how to go about creating objects that fit into various categories.  Proxies exasperates these issues because they permit anyone to fairly easily creates new kinds of objects with extraordinary behaviors.  Rather than just adding new ad hoc categorization tests or  additional operations to manufacture specific "types" of objects I think we need to see if we can first establish some sort of principled foundation for doing so.   I also think it is important that we try to do this before Proxies are completely frozen as we need to make sure that they can support what we come up with.

On Mar 30, 2011, at 3:30 AM, David Bruant wrote:

> Hi,
> I have recently read http://wiki.ecmascript.org/doku.php?id=strawman:es5_internal_nominal_typing and in particular "Array.isArray should test for presence of special [[DefineOwnProperty]] rather than testing [[Class]" (There is a typo, by the way) which has woken up one fundamental question which is "what is an array?". The consequence of this is "for what Array.isArray should return true or false?"

Yes, in ES5 I think we were too quick in defining isArray in terms of "[[Class]]"

> Starting with the defintion of "new Array" in the ES5 spec, we can see that an Array is created with a certain prototype value and the internal [[Class]] equal to "Array".
> Of course, the prototype doesn't define an Array by itself. Object.create(Array.prototype) will not return what we could consider as an Array. In a way, same thing for [[Class]], it's "just a string" and doesn't really identify what arrays are.

I think that what the ES5 specification means by "Array object" is precisely the characteristics enumerated in 15.4.5 "Properties of Array Instances" and in general when the specification says "<some built-in constructor name> object" it means a object with the characteristics enumerated in the 15.x.5 (usually) section titled "Properties of <some built-in constructor name> Instances".  Note that these sections, typically specify a [[Prototype]] value, a [[Class]] value, any special internal internal methods or internal properties, and any instance own properties. The [[Prototype]] is probably the least important of these values, from an extensibility perspective.

However, the spec. is imperfect. Many of specifications for built-in constructors neglect to directly reference their corresponding "Properties of <me> Instances" section and explicitly re-enumerate the same information that is specified in the "Properties of <me> Instances" section. Sometimes they leave things out!  (I'm going to file some spec. bugs on this).

But overall, I think that  "a <Built-in Constructor> Object" together with a reference to the corresponding 15.x.5 section is the best way for the specification to talk about the specific kinds of built-in objects in most situation situations.  Referring to individual traits (for example the [[Class]]) value is making unwarranted assumptions about the future or extensions (for example, that  [[Class]]==="Array" also implies support for a specific [[DefineOwnProperty]] semantics). 

The ES definition of Array.isArray is an example of this.  It uses a [[Class]] test because it wanted array objects imported from different top level global contexts to pass the test, even if their [[Prototype]] value was different from that used in the current context.  Because the ES5 spec. doesn't even have the concept of the multiple top level contexts so there was no way to just say it.  [[Class]]=="Array" was picked as a testable characteristic  that should be context independent but the [[DefineOwnProperty]] definition would have also been just as good a test and arguably the test should be for all internal characteristics that define Array-ness. 

Using a single characteristic test (such as [[Class]]=="Array") to also imply other uniquely associated characteristics is fine in a closed, self-contained specification like ES5 pretends to be.  But in reality, the ES5 spec. is just the foundation spec. of an extension system (for example, via host objects) and as soon as you start defining  extensions you can't any longer assume unique associations of disjoint characteristics.

> One thing that caracterizes arrays is the internal connexion between "length" and numerical properties. This link can be found in the specific [[DefineOwnProperty]]. But I think it may be too quick to conclude that Arrays are the objects with this particular internal function. One could imagine that one day, other objects have this particular [[DefineOwnProperty]] internal methods, but other [[Get]] or [[Set]] or [[HasProperty]] internal methods. In my opinion, one thing that makes an Array what it is is all internal methods and the internal consistency they create altogether. It turns out they are all the same than the Object ones and only [[DefineOwnProperty]] is different, but still, the internal consistency we rely on when using Arrays exists because of the conjunction of all internal properties.

I agree.  It is the complete set of internal methods supported by an object that characterize its language-level behavior.  In https://spreadsheets.google.com/ccc?key=0Ak51JfLL8QLYdFRCOXBRczJfRzNJSEk2eXptQ3BzalE&hl=en I've tried to categorized all objects specified by ES5 in this manner. I identified 14 "classes" of ES5 objects.

One direction I'm thinking might be appropriate is to formalize these internal method groups, name each one, and specify a contract for each.   Since the internal methods (and their mapping to Proxy traps) are essentially the ES internal meta-object protocol, these named contracts would essentially classify all built-in object primitive behavioral variations and might be useful to reflect to the user level for identify such primitive behavioral variations.  Since Proxies have the power to define new variations you would expect a Proxy writer to either re-implement an existing contract or to define a new contract.  In either case, the Proxy instances could get tagged according to their contract.

> But recently, I have created the ProxyArray library (https://github.com/DavidBruant/ProxyArray) where, on a proxy, I have recreated the internal array behavior and special relationship between 'length' and numerical indices. Leaving aside the fact that no ES engine could determine whether a proxy handler has the exact same semantics than actual array internal methods, my library doesn't create Arrays anyway. One of the reason is that, for instance, even if the engine could tell that the semantics is the same, it would certainly not apply Array-specific optimizations.

I don't think such optimizations should be brought into this discussion.  Not all implementations optimize Arrays.  Some implementations optimize array-index properties on all objects, not just Array objects.  Regardless, such optimizations are supposed to be transparent (other than for performance impacts) and you can't depend upon optimizations being the same when you move between implementations or even versions of the same implementation.

Other than that, your ProxyArray looks like a fine implementation of what the spec. expects except for two things 1) you can't set [[Class]]==Array which isn't your fault, and 2) you should be setting the [[Prototype]] of your objects via Object.getPrototypeOf([ ]) rather than Array.prototype because Array may not be its original built-in value (of course you would also need to cache Object.getPrototypeOf so it amounts to about the same thing.)

> I have mentionned "Array-specific optimizations". The same way, in the spec, for most Array.prototype method is written something like "The filter function is intentionally generic; it does not require that its this value be an Array object.". In these two contexts, what does "Array" mean?
> My answer to this last question and my (partial) conclusion to the discussion is: 
> "An array is something created after a 'new Array()' call (where 'Array' is the built-in constructor define in the spec). "

"Array object" means an object whose characteristics are as specified in 15.4.5.  This is also what the Array constructor is supposed to create.  Note that in ES5 none of the built-in Array.prototype methods have dependencies upon the this object being "an Array  Object".  However, some built-in methods for other built-in constructor prototypes do have dependencies upon internal state that is specific to one kind of object. That is one of the primary uses of nominal typing in the ES5 spec, to match specific built-in methods with specific internal object representation variations that the methods depend upon.  Note that because we are talking about low level representation issues, this is something that Proxies don't help with.  You can't currently use a Proxy to create an object that  RegExp.prototype.exec with successfully work with.  

> As a consequence, it englobes things created after a call to "Array()" (ES5 - Create and return a new Array object exactly as if the standard built-in constructor Array was used in a new expression with the same arguments (15.4.2).) or array literals (ES5 11.1.4 Array Initialiser - semantics Step 1 of "ArrayLiteral : [ Elisionopt ]" and "ElementList : Elisionopt AssignmentExpression").
> In my opinion, at creation (direct or indirect "new Array()"), arrays should internally contain a marker saying they've being created with the built-in Array constructor. I think that the intention behind [[Class]] was to have such a marker. But in a way, it's "just a string" and we could imagine a world where the [[Class]] was settable on some objects. Nevertheless even if setting [[Class]] to arrays it wouldn't make them Arrays just because some string has the right value.

I'm leaning in a similar direction, but I don't think it should be tried to a specific constructor.  That isn't extensible enough. Actually, I think there is a potential for two orthogonal markers.  One corresponds to the internal MOP contract I discussed it above.  It defines how the object supports various language operations such as property access.  The other is the internal representation marker used to match built-in methods to internal object representations.  I think the MOP contract is probably useful to expose.  I'm less sure about the other as presumably only an implementation can define built-in methods and alternative representations at that level.  If you need to do something comparable at the Proxy level you could simply invent you own internal (to the handler) tagging scheme to make sure your methods were only applied to the right kind of Proxy object.

> Earlier, I said that it was a partial conclusion. I would like to add Array subclasses to the definition. The "Standardizing __proto__" thread led to the idea of standardizing a subclassing mechanism. In my opinion, objects constructed as subclassed arrays should benefit from the Array identity (return true to "Array.isArray") and as a consequence benefit from Array extras natively (in ES5, applying Array extras to non-Array objects is implementation-specific).

You're mistaken about Array extras and non-Array objects.  In ES5 all Array.prototype methods are generic across all native objects.  The implementation dependency concerns host objects.

I'm pretty sure that  when people ask for "Array subclass" they are really asking for objects that support the array instance internal mop contract.

Something you didn't mention is most of the line items listed in http://wiki.ecmascript.org/doku.php?id=strawman:es5_internal_nominal_typing .  I content that Array.isArray (or [[Class]]=="Array") is in most cases a over specification where what is really desired a a more limited test for some finer grained characteristic. This over specification limits extensibility.  For example, String concat uses an array Class check to determine whether an argument should be concatenated as a single object or whether it should be exploded into constituent values. Why is this necessarily tied to array-ness?  Why can a user define a collection abstraction that explodes when concatenated?

Thanks, for thinking about this stuff.  I think is useful to get feedback in this area.


> Of course, all of this is just my opinion and is open to debate.
> David
> Influencial readings :
> - http://wiki.ecmascript.org/doku.php?id=strawman:es5_internal_nominal_typing
> - ES5 new Array (http://es5.github.com/#x15.4.2.1)
> - http://perfectionkills.com/how-ecmascript-5-still-does-not-allow-to-subclass-an-array/
> _______________________________________________
> es-discuss mailing list
> es-discuss at mozilla.org
> https://mail.mozilla.org/listinfo/es-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20110330/4ac440b0/attachment-0001.html>

More information about the es-discuss mailing list