Map/Set/WeakMap constructor genericity

Allen Wirfs-Brock allen at wirfs-brock.com
Fri Nov 2 10:47:38 PDT 2012


On Nov 2, 2012, at 8:54 AM, Jason Orendorff wrote:

> In the draft spec, you can basically turn any arbitrary object into a Map.
> 
>     var obj = new Date;
>     Map.call(obj);
>     Map.prototype.set.call(obj, "x", "y");
>     assert(Map.prototype.get.call(obj, "x") === "y");
> 
> The same object can be a Set too. Why not?
> 
>     Set.call(obj, ["z"]);
>     assert(Set.prototype.has.call(obj, "z"), true);
> 
> This is intended to make Map/Set/WeakMap subclassable, which is fine.  But can we specify that without exposing Map initialization as a primitive that users can apply to arbitrary objects?
> 
> As specified, a single object can have [[MapData]] and [[WeakMapData]] and [[SetData]]. This is a pain to implement, and I don't see the benefit to web developers.

Yes, indeed, although I believe the implementation can be less painful than you think.

Before I plunge into this, it may be helpful to review http://wiki.ecmascript.org/doku.php?id=strawman:subclassable-builtins 

I think you agree, that subclassability of built-ins is a valuable feature and that we want to avoid introducing any more non-subclassable built-ins (and, if possible, fix the existing ones so they are subclassable). So, it is a matter of how we can accomplish that.

Another design principle I've generally applied is that built-in "classes" should absolutely minimize their specialness.  We may have a built-in for perf or security reasons, to access an external resource that would not otherwise be accessible, to bridge to the implementation layer, to optimize runtime representations, etc.  But, where ever possible, the standard ES library should not be "magic".  It should be "self-hostable" in ECMAScript code.

So, if we follow that design principle in making Map and friends subclassable we  should do it in a manner that is consistent with what would be done in a self-hosted implementation.

One of the characteristics of creating class abstractions in JavaScript (whether manually or via a class declaration) is that object allocation is separated from object initialization.  The expression, new Foo, first allocates an ordinary object and then calls Foo to initialize it. Any specialness of the object derives not from its allocation but from the manner in which it is initialized.  For example, Foo might place a private symbol keyed property on the object to brand it as being a special Foo object.  Or, in the case of a Map object the Map constructor might associate special "MapData" internal state with the object via a private symbol keyed property. 

If we are going to subclass such a Foo (or Map) object (let's talk about a Bar subclass) the Bar constructor function needs to be able to able to call Foo to initialize the instance.  We do this by making a super call to Foo:
class Bar extends Foo {
   constructor() {
        //maybe do some initialization on this here
        super.constructor()}  //or just super(); either really means pretty much the same as Foo.call(this)
        //maybe do some other initialization on this here
};

So, Foo has to be prepared to deal with an  arbitrary this object that may have already had some initialization performed upon it.  It also means that anybody who has visibility of Foo (via either a name binding or via the constructor property of a Foo instance ) can call Foo to initialize any arbitrary object. And there is nothing that prevents someone from call Foo, Bar, Map, and any other constructors all on the same object.  If the initialization actions performed by all of the constructors are disjoint then every thing should work just fine.  If the different constructors interfere with each other you will have a buggy object, but this is one of an infinite number of ways to define a buggy object.  Also note that calling multiple constructors  is not necessarily an unreasonable thing to do:  consider for example, a package that was supporting a multiple inheritance layer for JavaScript.

So, the possibility that an object may be initialized by multiple constructors as an inherent part of JavaScript and if we are following the builtins-aren't-magic principle we should expect this to apply to them as much as any other objects.

There is one way, to code classes so that allocation is coupled to initialization.  Move allocation into the constructor:

class Foo {
   constructor () {
       let self = Object.create(Object.getPrototypeOf(this));
       // initialize self
       return self;
   }
}

However, then a subclass has to be written as:
class Bar extends Foo {
   constructor() {
      let self = super();
      //initialize Bar state using self to reference the instance
      return self;
    }
}

This seems error prone in may ways:  the subclass has to remember to  capture the result of the super.constructor call;  Constructors always need to end with an explicit return of self;  you have to avoid using this within constructors;  you can't make super calls to any methods other than the constructor, etc.  And it breaks many multiple inheritance scenarios.  The cure seems worse than the disease.  Very few people are going to get screwed up by unintentionally multiply initialize some object. A lot of people will make mistake if thye have to remember to apply the above patterns to their superclass/subclass constructors.

I think the bottom lines is that the possibility of an object have multiple initializer is inherent to JS, whether we are talking about built-in or user defined objects.

Now let's talk about the implementation pain.

In JS code, we would avoid conflict among multiple initializers by using a unique and/or private symbol to access state that is specific to some "class".   The same thing can be done at the built-in implementation level. As I discuss in the above reference.  When I specify an internal data property such as [[MapData]] in the spec what  I'm saying is that here is private internal state that you might internally represent using a private symbol named own property.  You can do something else, but it needs to have the dynamic extensibility characteristics of object properties.  In other words you have to be able to add it after you allocate the object.

That probably precludes you from directly using a unique C/C++ data structure  for such objects.  But there is nothing that stops using you from using having such a data structure and using it as the value of a private symbol keyed property known only to you.  This adds the overhead of a property indirection to get at your native data structure.  I can think of various optimization to avoid that indirection in most cases, but I'm not sure it would be worth it.

So, yes, subclassable built-ins adds (or changes) somethings for built-in implementations. There may be a little pain at first, but I think it is worth it in the long run.

Allen











> 
> -j
> 
> _______________________________________________
> es-discuss mailing list
> es-discuss at mozilla.org
> https://mail.mozilla.org/listinfo/es-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20121102/dba756b2/attachment-0001.html>


More information about the es-discuss mailing list