Binary data (ByteArray/ByteVector) proposal on public-script-coord
Maciej Stachowiak
mjs at apple.com
Thu Nov 5 16:42:41 PST 2009
Added public-script-coord since discussion is happening here.
On Nov 5, 2009, at 3:08 PM, Alex Russell wrote:
> On Nov 5, 2009, at 2:48 PM, Maciej Stachowiak wrote:
>
>>
>> I pulled together a rough proposal for representing binary data in
>> ECMAScript and posted it on public-script-coord. I think having
>> this is important for many W3C specs, but it is probably best
>> defined in ECMAScript. I'm posting a link here in case anyone is
>> interested and is not on the public-script-coord mailing list yet:
>>
>> http://lists.w3.org/Archives/Public/public-script-coord/2009OctDec/0093.html
>
> Looks promising! A couple of thoughts:
>
> * the middle-ground approach seems interesting, although having
> them not be "real" arrays feels like we're just kicking the can down
> the road WRT the large-ish number of things that could be thought of
> as arrays but which don't act like them (NodeList, arguments, etc.).
I understand the concern. Indeed, for things like NodeList or
HTMLCollection or arguments, it's often very desirable
My claim is that Data is not much like these things. I believe it is
more like String. It happens to be a sequence (of a very specific
type), but it's specialized enough to be worth treating differently.
Do people often regret that String is not an Array? My impression is
that this is not a common concern. That's why I imagined this design
point.
But imagine we decided to go the other way and try to make these
things arrays:
(a) I believe DataBuilder could be made an Array without introducing
serious problems.
(b) I think Data could be made an array, but all the mutating methods
of Array (which is a great deal of them) will always fail, so that
seems like poor API design. I'd prefer to have a design where the
immutable object lacks mutating methods entirely, rather than having
mutating methods that always fail. That being said, just the read-only
methods from the Array prototype could be provided.
(c) Array methods that return a new Array may be poor fits for Data/
DataBuilder - perhaps they could return a Data or DataBuilder instead
if they are provided.
> * any thoughts on type conversions? what does this do/return?:
>
> var bits = new Data("...");
> var res = bits += "?";
>
> will strings have a toData() protocol? Should other objects be
> able to implement such a protocol? will there be a canonical byte
> format for all strings in the language?
Converting a String to a Data presumably involves charset encoding/
decoding. I have not made a proposal for that in my initial strawman.
I do think charset transcoding is an extremely useful feature for many
use cases though, especially ability to encode/decode UTF-8 and
WinLatin1. Since you need a choice of charset encoding to meaningfully
convert between binary data and strings, I think it's better not to
make it implicit, but rather have explicit named methods that can take
the encoding as a parameter. At least, that's my tentative thinking.
Another possibility is to assume that in cases where you don't specify
an encoding, strings are converted to/from UTF-16.
> * given that Data are array-like things that have the property of
> being packed (like arguments), maybe we're just missing a
> PackedArray superclass in general that could help w/ the efficiency
> concerns (irrespective of mutability).
I'm not sure what you mean by being packed or the similarity to
arguments. Arguments contains arbitrary values, Data contains only
unsigned integers in the range 0-255. Data is immutable. And with
Data, it may not often be desirable
> * what do you think about a toArray() method?
That can certainly be done. I am somewhat wary, because I think the
Array version will often be much less efficient in speed and memory,
and I believe it will rarely actually be useful. Imagine getting a raw
binary JPEG over the wire. It's extremely unlikely you'd want to
convert this to an Array, or call methods like filter() or map() on it.
> * do you envision any provision for multi-dimensional Data
> objects? E.g., <canvas> data.
The proposal here is that Data just holds raw binary data, without
imposing structure. If you want to use it to hold image data or a
frame buffer, then by my proposal, you have to do the indexing math
yourself.
Regards,
Maciej
More information about the es-discuss
mailing list