Feedback on Binary Data updates

Luke Hoban lukeh at microsoft.com
Thu Jul 21 22:03:08 PDT 2011


>The idea is definitely to subsume typed arrays as completely as possible.

Great.  

> The idea is that all Types have a known size, and all Data instances are allocated contiguously.

> For example, if you could put unsized array types inside of struct types, it wouldn't be clear how to allocate an instance of the struct:

>     var MyStruct = new StructType({
>         a: Uint8Array,
>         b: Uint8Array
>     });
>     var s = new MyStruct; // ???

> But you're right that this is inconsistent with typed arrays. Maybe this can be remedied by allowing both sized and unsized array types, and simply requiring nested types to be sized.

I see.  That makes sense from the struct type definition perspective.  My assumption is that this usage will be a fair bit less common than the use of the array constructor for directly allocating an array.  Having both sized and unsized may help, though the two are quite different, and it may be hard to sufficiently distinguish them.  I wonder if it is too subtle to have "UInt8Array(5)" be the sized type, and "new UInt8Array(5)" be the allocation of a new array?

> I don't know what you mean by copy constructors. Are you talking about being able to construct a type by wrapping it around an existing ArrayBuffer? That doesn't copy, but I do think we should support it, as I said in my preso at the f2f in San Bruno. That's something I intended to add to the wiki page but hadn't gotten to yet.

The following makes a copy of the buffer (and similar works if arr1 is a JS array):
  var arr1 = new Uint8Array(10);
  arr1[3] = 7;
  var arr2 = new Uint8Array(arr1);
  arr2[3] === 7
  arr2[4] = 5
  arr1[4] !== 5

> I do think there's a case to be made for not exposing the ArrayBuffer for Data objects that were not explicitly constructed on top of an ArrayBuffer. This would hide architecture-specific data that is currently leaked by the Typed Arrays API. It also accommodates the two classes of usage scenario involving binary data:
> ...
>
> Scenario 1 comes up when reading files, network sockets, etc; here you *have* to let the programmer control the endianness and layout/padding. The simplest way to do the latter is simply to assume zero padding, as with Data Views, and then the programmer would have to insert padding bytes where necessary.
> 
> Scenario 2 comes up when building internal data structures. Here the system should use whatever padding and endianness is going to be the most efficient for the architecture, but that detail should ideally not be exposed to the programmer. So in that case, we could make the .buffer field censored, by having it be null or an accessor that throws.

I see the two use cases, but I am a little concerned about the complexity of trying to support each with different representations of the struct.  For example, what happens if I am using the "pure compute" example, and then decide I want to be able to serialize my large in-memory representation up to a binary file on the server?  

> I agree that this kind of use case is important, and I'm not opposed to DataViews, but I'm not sure the ArrayBuffer approach described above doesn't already handle this, e.g.:
>     new T(ArrayBuffer buffer, unsigned long byteOffset, optional boolean littleEndian);

Indeed - that does address the use case, and aligns with what you do currently to extract embedded fixed length arrays from the buffer. 

> I'm open to this. I think there's no technical concern, just a question of what's the best "home" for Typed Arrays.

Right - I think that is the ultimate question.  My feeling is that an ES.next spec that only includes what is currently spec'd in Binary Data will feel  incomplete for many practical tasks, and would end up effectively taking a dependency on the web platform to provide a complete array story.

> This doesn't really bother me. As you say, users don't really need to work with them; they're mostly there to set up the inheritance of shared methods, and they make for a nicely symmetric meta-class hierarchy. From the user's standpoint, they'll mostly just care about the primitive types, StructType and ArrayType, and then the type and data objects they create.

Yeah - I think this is likely not a significant concern.  And I don't have a concrete suggestion for simplifying this currently. 

> The reason I eliminated "block" was that it's such a highly-used term for many different things (e.g. block statements, block functions). The terms Type and Data are implicitly scoped to the @binary module, which is one of the benefits of modules: you don't have to explicitly scope every single definition's name to the subject matter at hand.

The module scoping does help, but the name "type" still feels even more overloaded than "block".   I won't bikeshed any further on naming yet though :-)

Luke




More information about the es-discuss mailing list