Array length property wrap-around

Kent Hansen khansen at trolltech.com
Thu Nov 20 06:36:19 PST 2008


David-Sarah Hopwood wrote:
> Kent Hansen wrote:
>   
>> Hi,
>> What's supposed to happen when one of the built-in methods (e.g.
>> Array.prototype.push) tries to assign a value greater than 4294967295 to
>> the length property?
>>
>> js> a = new Array(4294967295); a.push("foo")
>> 0
>>
>> i.e. the length becomes 0.
>>     
>
> This is a specification bug in the Array.prototype.push algorithm
> (section 15.4.4.7), due to the ToUint32 coercion in step 2.
>
> Suppose for the sake of argument that there were no such coercion,
> or a coercion to a wider type, for example that step 2 were
> something like "Let n be ToNumber(Result(1))." Then:
>
>   Assume first that the 'this' value is a native Array object, i.e.
>   the algorithm of section 15.4.5.1 ([[Put]] in ES3, [[ThrowablePut]]
>   in the Kona draft) is being used to put array properties.
>
>   In step 13 of the array put algorithm, the result of ToUint32(V),
>   where V is the new array length, is compared to the result of
>   ToNumber(V), and a RangeError exception is thrown if they are
>   different.
>
>   So, the behaviour would be that the Array.prototype.push
>   algorithm would run up to the point at which it attempts to set
>   index [4294967295] to the next argument. Since this is not a
>   valid array index, it would be detected in the above-mentioned
>   step 13 (because ToUint32(4294967296) = 0 whereas
>   ToNumber(4294967296) = 4294967296), and a RangeError would be
>   thrown. This behaviour would be entirely reasonable.
>   

No, step 13 applies to setting the "length" property. It would instead 
be detected in step 8: "If P is not an array index, return"; 4294967295 
is not an array index.
So setting that property should work just fine (and does, in the 
implementations I tried).

>   (A case could be made that Array.prototype.push should detect
>   based on the number of arguments and the initial length that
>   pushing some argument is *going* to fail, and fail early with
>   no effect. Few other algorithms in ES3/3.1 make this strength
>   of exception guarantee, and the proposed fixes below do not
>   make it. In any case, there are other ways the algorithm can
>   throw an exception after side-effects have occurred.)
>   

I think it's OK that the algorithm doesn't do the is-going-to-fail check.
>   Note that step 7 of the Array.prototype.push algorithm is
>   redundant in the case of a native Array object, because the
>   preceding [[Put]] will already have set the 'length' property
>   correctly (if it needs to be set, i.e. if there are more than
>   zero arguments). It is only required for the case where the
>   algorithm is being used on a generic array-like object.
>   

Indeed.
>
> The ToUint32 coercion in Array.prototype.push, however, causes
> a wrap-around, so that the final of n is calculated modulo 2^32
> (and so are the intermediate values, which is why no RangeError
> occurs). Any array index properties beyond the wrapped value should
> then be deleted by the array put algorithm, due to its special case
> handling of 'length', which is set by Array.prototype.push in step 7.
>   

No, "the final of n" is not calculated modulo 2^32; after n is increased 
there is no new modulo operation (neither in ECMA-262 or in the Kona 
version of draft). So I still think step 7 should cause a RangeError; 
it's the implementations that got it wrong.
>   
>> This also happens if you apply push to
>> non-Array instances, which don't have the length property constraint
>> described in 15.4.
>>     
>
> Yes, because the initial ToUint32 coercion also happens in that case.
>
>   
>> Step 5 of 15.4.4.7 says to "increase n by 1", but it does not say that n
>> should wrap around to 0 if  the new value can't be represented as a
>> 32-bit unsigned integer (the ToUint32 conversion is only done when
>> assigning the initial value of n, in step 2).
>>     
>
> Hopefully the above explains this. n has already wrapped by that point.
>   

Section 9.6 states that "ToUint32 converts its argument to one of 2^32 
integer values 0 through 2^32-1, inclusive". 2^32-1 = 4294967295, so the 
initial assignment of n in Array.prototype.push does not cause a wrap. 
Hence the value of n at step 7 would be 2^32 in my example --> RangeError.
> However, this is not the whole story: there is a separate
> implementation bug in one or more of the implementations you
> are testing with -- the one(s) that have prompt "js>".
>
>   
>> If I interpret the spec correctly, n should become 4294967296, and step
>> 7 should then cause a RangeError when trying to assign the new length.
>> In the implementations I've tried, this is not the case, and instead you
>> get this interesting behavior:
>>
>> js > a = new Array(4294967294); a.push("foo", "bar")
>> 0
>> js> a[4294967294]
>> foo
>>
>> Oops; the length invariant broke.
>>     
>
> Yes, and that can only happen if there is an implementation bug in
> the handling of array puts. It is possible that the implementation(s)
> optimize 'push' in a way that bypasses the normal array put code;
> if so then the bug could be there, otherwise it must be in the
> general handling of array [[Put]]/[[ThrowablePut]].
>
> Which implementation(s) did you test with?
>   

 > js --version
JavaScript-C 1.7.0 2007-10-03

JavaScriptCore (jsc) is from yesterday's WebKit trunk.

>   
>> With WebKit:
>>
>> jsc > a = new Array(4294967294); a.push("foo", "bar")
>> 0
>> jsc> a[4294967294]
>> undefined
>>
>> Oops; all the properties 0,1,2,...,4294967294 were wiped out.
>>     
>
> WebKit is "correctly" conforming to the flawed ES3 specification
> of Array.prototype.push as described above.
>   

No, I still say that if it actually followed the spec it would throw a 
RangeError. :-) If you look at the JSC implementation of push, you will 
find that it uses an unsigned int to hold n and calculate the new 
length, so on 32-bit processors at least it's going to (wrongfully) wrap 
around. JSC uses an optimized case for Array instances, like you 
suspected, but the bug is the same for non-Arrays. (The straight-forward 
patch in JSC is to replace "unsigned length" with "double length" in the 
slow-case initialization; then the function behaves according to spec 
for non-Arrays, at least.)
>   
>> I also question step 2 of 15.4.4.7. Why is the narrow ToUint32
>> conversion necessary; shouldn't it be a variant of ToInteger?
>>     
>
> It should either be a wider conversion, or it should be checked
> to be in the range 0..2**32 - 1. A ToInteger conversion is not
> correct, though. For objects that are not native arrays but are
> sufficiently array-like for the semantics of 'push' to make sense,
> the initial value of 'length' should represent a non-negative
> integer within the contiguous range of integers that can be
> represented exactly as a number value. For compatibility, we need
> to also accept values that coerce via ToNumber to such an integer.
>   

Yes, that's essentially what I meant by the vague "a variant of". :-)
> (Technically, any change in this conversion is a theoretical
> incompatibility, because a program could deliberately set an
> integer value of 'length' that differs from the intended length
> by a multiple of 2^32. I think it's extremely unlikely that any
> code would be relying on this -- we have no obligation to maintain
> this degree of bug-for-bug compatibility. But the change should go
> into Annex E anyway.)
>
>   
>> We know
>> that for Array instances, the invariant on the length property holds
>> anyway, so the result of the conversion will be the same; but using
>> ToUint32 makes the function less generic:
>>
>> js> o = { length: 4294967296 }; Array.prototype.push.call(o, "foo")
>> 1
>>
>> Pretending I hadn't read the spec, I would have expected the above to
>> create a property named "4294967296" and increase length by 1; no 32-bit
>> conversion imposed.
>>     
>
> That's possible, but for simplicity my proposed fix below imposes the
> same length limitation on all array-like objects. This matters in the
> case of methods like 'map' and 'filter', which create an Array object
> that is no longer than their input array-like object. Also, if this
> limit were not imposed then there would be another set of checks needed
> around 2**53, when integers representable as a number value are no
> longer contiguous.
>   

Sounds fair.
>   
>> Looking at the other functions in Array.prototype I see that they do
>> ToUint32 on the length too, so this change might become a bit large and
>> dangerous.
>>     
>
> Yep. This bug occurs in all of the Array methods except toString and concat.
>
> toString avoids it only by delegating to join.
>
> concat avoids it by checking explicitly whether 'this' and each of its
> arguments are native Array objects (and so it can assume that their length
> is an integer in the correct range); if not then it behaves as if the
> object were wrapped in a single-element array.
> (The NOTE at the end of section 15.4.4.4 is potentially misleading
> because this behaviour is different to any of the other array generics,
> and does not allow using array-like objects as if they were native arrays.
> It should probably be clarified.)
>   

Yes, that's an interesting observation.
>
> To limit the size and risk of the change, I suggest defining a
> "drop-in replacement" for ToUint32 that can be used in all of the
> Array methods. With that approach, it is easy to allow implementations
> to increase the maximum length of arrays. (If anyone objects to that
> aspect of the changes in particular, it's not necessary --
> ArrayLengthLimit below could be replaced with the constant 2**32 - 1.)
> Also, no renumbering of steps is necessary.
>
>
> ====
> 15.4 Array Objects
>
> Array objects give special treatment to a certain class of property names.
> Let ArrayLengthLimit be an implementation-defined constant integer
> in the range 2**32 - 1 to 2**53 - 1 inclusive. A property name P
> (in the form of a string value) is an array index if and only if
> ToString(ToNumber(P)) is equal to P and ToNumber(P) is a nonnegative
> integer less than ArrayLengthLimit. Every Array object has a 'length'
> property whose value is always a nonnegative integer less than or
> equal to ArrayLengthLimit. The value of the length property is greater
> than the numeric value of the name of every property whose name is an
> array index; whenever a property of an Array object is created or changed,
> other properties are adjusted as necessary to maintain this invariant.
> [existing text retained]
> ... inherited from its prototype.
>
> Many algorithms in this specification can operate either on Array
> objects, or on other objects that have a 'length' property and other
> properties whose names are array indices. To allow for this, the
> following operator is used to coerce the 'length' property of an
> object to an array length:
>
> ToArrayLength(V)
>
>  1. Call ToNumber(V).
>  2. If Result(1) is not a nonnegative integer less than or equal to
>     ArrayLengthLimit, then throw a RangeError.
>  3. Return Result(1).
> ====
>   

Looks OK.
> Replace the following paragraph in section 15.4.2.2:
>
> # If the argument len is a Number and ToUint32(len) is equal to len,
> # then the length property of the newly constructed object is set to
> # ToUint32(len). If the argument len is a Number and ToUint32(len)
> # is not equal to len, a RangeError exception is thrown.
>
> with
>   If the argument len is a Number, then call ToArrayLength(len).
>   If this does not result in a RangeError exception being thrown,
>   set the length property of the newly constructed object to the
>   result.
>   

The wording "If this does not result in a RangeError exception being 
thrown" seems a bit alien. Usually these sort of things are described by 
algorithms, where ToArrayLength(len) would be one step.
>
> Change each of the following:
>  - steps 13 and 18 of 15.4.4.4 (Array.prototype.concat)
>  - step 5 of 15.4.4.7 (Array.prototype.push)
>
> to "Let the new value of n be ToArrayLength(n+1)."
>   

To be in style with the existing algorithms, ToArrayLength(n+1) should 
probably be a separate step. This makes it clearer that the operation 
can actually cause the algorithm to abort (in which case the assignment 
to n won't happen).
And I don't think you need "the new value of". (Sidenote: When trying to 
find the notation used for reassigning a variable (as opposed to just 
increasing or decreasing by 1), I spotted a small issue in 15.4.4.13 
Array.prototype.unshift: Step 5 says "If k is zero, go to step 15"; Step 
15 says "Let k be 0"; and there is no other way to reach Step 15, so it 
can be eliminated (16 is the new 15) -- assuming "zero" and "0" are 
equivalent, that is. ;-) Actually the spec is not consistent here, some 
places "is zero" is used, other places " = 0" is used. I digress.)
> Change step 7 of 15.4.4.13 to "Call ToString(ToArrayLength(k+Result(3))–1)."
> (Alernatively, insert a call to ToArrayLength(k+Result(3)) at step 5,
> and use that result in the last two steps for the final length.)
>   

It seems that calling ToArrayLength at that point can introduce the kind 
of if-it-will-fail-then-throw behavior that you argued against in the 
push case. In any case I don't see why ToArrayLength has to be applied 
to the _indexes_. My suggested change:

21. Call ToArrayLength(Result(2)+Result(3)).
22. Call the [[Put]] method of this object with arguments "length" and 
Result(21).
23. Return Result(21).

> Replace all uses of ToUint32 in sections 15.4.* with ToArrayLength.
>   

OK.
>
> I have checked that all of these replacements are appropriate, and
> that the resulting algorithms except 'splice', 'indexOf', and
> 'lastIndexOf' have no remaining overflow bugs.
>   

Nice.
> 'splice' is patently too complicated to review when written in the
> ECMA-262 algorithm notation; someone should transliterate it to a
> real programming language (even ECMAScript counts here ;-) so that
> it is easier to check. It looks like it can create an array longer
> than its input arrays and so needs a ToArrayLength call somewhere,
> but where is anyone's guess. 'indexOf' and 'lastIndexOf' incorrectly
> call ToInt32 and generally need more work; I won't attempt to fix
> them here.
>   

Yeah, that one's a strange beast. To me it seems natural (and the 
safest) to insert an extra step before current steps 16 and 53, .e.g.

16. Call ToArrayLength(Result(6)).
17. Call the [[Put]] method of A with arguments "length" and Result(16).
> The resulting 'concat', 'push', and 'unshift' methods can throw a
> RangeError because the length of the array they create would be
> greater than ArrayLengthLimit; the others (excluding 'splice') can
> only throw a RangeError when they operate on an object that is not
> a native array and has an invalid length property.
>   

That's fine I think. If I hadn't initialized "length", I'd want an 
exception rather than undefined --> NaN --> 0 --> n as value of 
"length". But who knows, maybe there's existing code that relies on it, 
in which case this could maybe be a strict mode candidate.
>
> A couple of other changes are also needed outside the main
> Array section (15.4):
>
>
> In section 15.5.4.14 (String.prototype.split), replace
>
> # 3. If limit is undefined, let lim = 23**2 - 1; else
> #    let lim = ToUint32(limit).
>
> with
>   3. If limit is undefined, let lim = ArrayLengthLimit; else
>      let lim = ToArrayLength(limit).
>
>
>   

Good.
> As a minimal change to section 15.3.4.3 (Function.prototype.apply),
> replace:
>
> # If argArray is either an array or an arguments object, the function
> # is passed the (ToUint32(argArray.length)) arguments argArray[0],
> # argArray[1], ..., argArray[ToUint32(argArray.length)–1].
>
> with
>   If argArray is either an Array object or an arguments object, then
>   let n be ToArrayLength(argArray.length); if this does not throw a
>   RangeError, then the function is passed the n arguments argArray[0],
>   argArray[1], ..., argArray[n-1].
>
>
> Alternatively, below is a more ambitious change (desirable in its
> own right) that does the following:
>
>  - make apply handle its argArray generically;
>  - rewrite apply and call using the algorithm notation;
>  - clarify that apply and call return the return value of the called
>    function;
>  - acknowledge that there may be an implementation-defined limit on
>    the number of arguments to a function. (This also needs to be fixed
>    in section 11.2.4.)
>
>   

Nice intentions, I'll have a look at it later.
>> Still, the undefined behavior /  data loss irks me.
>>     
>
> The behaviour is defined, just wrong (*and* array puts are apparently
> misimplemented).
>
>   


Yep, agreed. Thanks for the detailed analysis and suggested fixes!

Regards,
Kent


More information about the Es-discuss mailing list