Feedback on Binary Data updates

David Herman dherman at mozilla.com
Wed Jul 20 10:52:16 PDT 2011


Hi Luke,

The idea is definitely to subsume typed arrays as completely as possible.

> * Array types of fixed length
> The current design fixes the length of an ArrayType instance as part of the ArrayType definition, instead of as a parameter to the resulting constructor.  I'm not sure I understand the motivation for that.

The idea is that all Types have a known size, and all Data instances are allocated contiguously.

For example, if you could put unsized array types inside of struct types, it wouldn't be clear how to allocate an instance of the struct:

    var MyStruct = new StructType({
        a: Uint8Array,
        b: Uint8Array
    });
    var s = new MyStruct; // ???

But you're right that this is inconsistent with typed arrays. Maybe this can be remedied by allowing both sized and unsized array types, and simply requiring nested types to be sized.

> * Compatibility with Typed Arrays array objects
> There are a few divergences between Binary Data arrays and Typed Array arrays, that look like they could be addressed:
> - The constructor difference mentioned above, including support for copy constructors.

I don't know what you mean by copy constructors. Are you talking about being able to construct a type by wrapping it around an existing ArrayBuffer? That doesn't copy, but I do think we should support it, as I said in my preso at the f2f in San Bruno. That's something I intended to add to the wiki page but hadn't gotten to yet.

> - Lack of buffer, byteLength, byteOffset, BYTES_PER_ELEMENT.   I see these are noted in TODO.

Yep.

I do think there's a case to be made for not exposing the ArrayBuffer for Data objects that were not explicitly constructed on top of an ArrayBuffer. This would hide architecture-specific data that is currently leaked by the Typed Arrays API. It also accommodates the two classes of usage scenario involving binary data:

Scenario 1: I/O

    socket.readBuffer(1000, function(buf) {
        var s = new MyStruct(buf, 0); // also allow an optional endianness argument
        ... do some computation on s ...
    });

Scenario 2: Pure computation

    var s = new MyStruct({ x: 0, y: 0 });
    ... do some computation on s ...

Scenario 1 comes up when reading files, network sockets, etc; here you *have* to let the programmer control the endianness and layout/padding. The simplest way to do the latter is simply to assume zero padding, as with Data Views, and then the programmer would have to insert padding bytes where necessary.

Scenario 2 comes up when building internal data structures. Here the system should use whatever padding and endianness is going to be the most efficient for the architecture, but that detail should ideally not be exposed to the programmer. So in that case, we could make the .buffer field censored, by having it be null or an accessor that throws.

> - array.set(otherArr, offset) support on the Binary Data arrays

Good catch; looks unproblematic.

> - Conversions, see below
> - Different prototype chains, additional members like elementType on binary data arrays.  
> 
> The last item is one of the reasons why it would be nice to pull the Typed Arrays objects into Binary Data, so that they could be augment to be fully consistent - for example, to expose the elementType.

If we can pull them into the prototype hierarchy, that's cool, but we still have to see. In particular, if we want to close off some of the leaks I describe above, then we may have to retain some distinction.

> * Conversions
> The rules for conversions of argument values into the primitive value types seem to be different than typical ES conversions and those used by TypedArrays via WebIDL.  Why not use ToInt32 and friends for conversion?  Current rules appear to be quite strict - throwing on most type mismatches, and also more permissive for some unexpected cases like "0x"-prefixed strings.

Interesting question. I may have followed js-ctypes too blindly on this.

> * DataView integration with structs
> DataView is an important piece of Typed Arrays for reading from heterogenous binary data sources like files and network protocols, and for controlling endianness of data reads.  DataView would seem to benefit from structs, and structs would benefit from DataView.  This is another reason to want to spec DataView itself in ES.next.  I imagine an additional pair of functions on DataView akin to the following would allow nice interop between DataView and Binary Data "Types"/"Data":
> 
>    Data getData(Type type, unsigned long byteOffset, optional boolean littleEndian);
>    void setData(Type type, unsigned long byteOffset, Data value, optional boolean littleEndian);

I agree that this kind of use case is important, and I'm not opposed to DataViews, but I'm not sure the ArrayBuffer approach described above doesn't already handle this, e.g.:

    new T(ArrayBuffer buffer, unsigned long byteOffset, optional boolean littleEndian);

> * Explicit inclusion of Uint32Array  and similar objects
> The Uint32Array and similar objects defined in Type Arrays are the ones that are likely to be the most commonly used in many/most use cases, but these are missing from the ES.next proposal.  Including them in the ES.next proposal explicitly, as supersets of the Typed Arrays objects, would avoid users having to manually create them, and help ensure full API consistency.

I'm open to this. I think there's no technical concern, just a question of what's the best "home" for Typed Arrays.

> * A lot of meta- objects
> The spec defines 14 objects, without yet defining any of the 10 typed arrays objects.  Several of the objects only serve as scaffolding for the meta-hierarchy, and don't appear to be objects which users are expected to frequently (or ever) work with.  Are the named "Type" and "Data" objects needed in the proposal?

This doesn't really bother me. As you say, users don't really need to work with them; they're mostly there to set up the inheritance of shared methods, and they make for a nicely symmetric meta-class hierarchy. From the user's standpoint, they'll mostly just care about the primitive types, StructType and ArrayType, and then the type and data objects they create.

> * Naming
> The term "Type" feels somewhat too generic for referring to struct shapes.  The previous "block" terminology actually sounded more natural, or at least more scoped.

The reason I eliminated "block" was that it's such a highly-used term for many different things (e.g. block statements, block functions). The terms Type and Data are implicitly scoped to the @binary module, which is one of the benefits of modules: you don't have to explicitly scope every single definition's name to the subject matter at hand.

Dave



More information about the es-discuss mailing list