Proposal: Array.prototype.shuffle (Fisher-Yates)

Isiah Meadows isiahmeadows at gmail.com
Mon Apr 30 05:11:41 UTC 2018


Replying inline, so I can target specific points better.

-----

Isiah Meadows
me at isiahmeadows.com

Looking for web consulting? Or a new website?
Send me an email and we can get started.
www.isiahmeadows.com


On Sun, Apr 29, 2018 at 7:31 PM, J Decker <d3ck0r at gmail.com> wrote:
> I can see that's certainly something that can be gotten wrong; the in-place
> sort is kind-of nice; but you end up hitting most numbers twice, and with a
> longer array (say a set of 75 bingo balls) you can move the same number
> multiple times.

I specifically mentioned this (the easy to screw up) in the link I
created and posted here.

>
> which gives numbers at the end a higher chance of being at the start when
> all is done; and almost guaranteed to not be at the end.  I use sha2 as a
> stream of bits; and when it runs out of bits, invokes a callback to get more
> salt ( that is optional, although adding Date.now() is sufficient to
> increase the overall entropy slightly  ).   This can also be used as a
> generator procedural content generation, since you can specify the initial
> salt and subsequent salts,  or even copy the state, and fork two streams
> using slightly different progressive entropy.
> https://github.com/d3x0r/-/blob/master/org.d3x0r.common/salty_random_generator.js
>
> And then to shuffle, for each number in the array, assign a random number;
> hang in a binary tree from least to most; then gather the tree back out.
> requires additional memory approximately the 4xsize of the original array
> tough temporarily.  (don't seem to have a JS version, but the C version
> ports pretty easily)
> https://github.com/d3x0r/SACK/blob/master/src/utils/cardodds/shuffle.c

For an in-place shuffle, that's still not especially efficient. It's
still taking O(n log n) stack space + global space + time, even though
it doesn't require real heap allocation.

>
> This allows any number to result in any position equally.
> The SHA2 hash generates very good random number characteristics; evaluating
> with diehard tests; chi-square etc.
> Allowing more salt to be introduced periodically destroys any overall period
> (even if it is in the realm of 256 bits).

Basing it on a cryptographic primitive is what I was referring to for
larger arrays. You need one that powerful for large arrays if you hope
to maintain any sort of statistical soundness. However, for small
arrays (like a "deck" of cards), it is likely overkill.

>
> if you were shuffling a deck of cards (52) you only need random values that
> are 6-7 bits at a time...
>
> It is slower than say http://www.pcg-random.org/using-pcg.html or mersenne,
> both of which I found to be very poor.  pcg generates zeros like 70% of the
> time.

If it's generating zeros that frequently, don't use it. The website
itself looks a bit on the sketchy side to me, but that's just me.\* A
Mersenne Twister with a large state shouldn't give you too much
trouble for simple stuff, and if you take the SFMT variant [1], you
shouldn't have speed issues too much. Also, note that I did specify a
higher-quality iteration on that, WELL [2], which is similar, but
improves on both the statistical properties and speed of the
traditional Mersenne Twister.

\* If you're not cryptographically secure, don't compare yourself to
ones that are, especially trying to make yourself look better than
them. Also, it missed a few other high quality generators (like the
xorshift128 family and xorshiro128+, two of the most efficient,
effective fast PRNGs out there) in its online comparison.

[1]: https://en.wikipedia.org/wiki/Mersenne_Twister#SFMT
[2]: https://en.wikipedia.org/wiki/Well_equidistributed_long-period_linear

>
> While a good shuffle should be able to start from the initial state and
> generate a good shuffled result, it is slightly better to progressively
> shuffle the previous result array into new positions than to start from an
> initial state and compute a 1-off.

Evidence? (BTW, the Fisher-Yates shuffle is theoretically as biased as
the RNG that powers it.)

>
> On Sun, Apr 29, 2018 at 3:27 PM, Isiah Meadows <isiahmeadows at gmail.com>
> wrote:
>>
>> BTW, I added this to my list of various proposed array additions (as a
>> weak one) [1].
>>
>> I did do a little reading up and found that in general, there's a
>> major glitch that makes shuffling very hard to get right (issues even
>> Underscore's `_.shuffle` doesn't address), specifically that of the
>> size of the random number generator vs how many permutations it can
>> hit. Specifically, if you want any properly unbiased shuffles covering
>> all permutations of arrays larger than 34 entries (and that's not a
>> lot), you can't use any major engines' `Math.random` (which typically
>> have a max seed+period of 128 bits). You have to roll your own
>> implementation, and you need at least something like xorshift1024* [2]
>> (1024-bit max seed/period size) for up to 170 entries, MT19937
>> [3]/WELL19937 [4] (seed/period up to 2^19937-1) for up to 2080
>> entries, or MT44497/WELL44497 for up to 4199 entries. Keep in mind the
>> minimum seed/period size in bits grows roughly `ceil(log2(fact(N)))`
>> where `N` is the length of the array [5], and the only way you can
>> guarantee you can even generate all possible permutations of all
>> arrays (ignoring potential bias) is through a true hardware generator
>> (with its potentially infinite number of possibilities). Also, another
>> concern is that the loop's bottleneck is specifically the random
>> number generation, so you can't get too slow without resulting in a
>> *very* slow shuffle.
>>
>> If you want anything minimally biased and much larger than that,
>> you'll need a cryptographically secure pseudorandom number generator
>> just to cover the possible states. (This is about where the
>> intersection meets between basic statistics and cryptography, since
>> cipher blocks are frequently that long.) But I'm leaving that as out
>> of scope of that proposal.
>>
>> [1]:
>> https://github.com/isiahmeadows/array-additions-proposal#arrayprototypeshuffle
>> [2]: https://en.wikipedia.org/wiki/Xorshift#xorshift*
>> [3]: https://en.wikipedia.org/wiki/Mersenne_Twister
>> [4]: https://en.wikipedia.org/wiki/Well_equidistributed_long-period_linear
>> [5]:
>> http://www.wolframalpha.com/input/?i=plot+ceiling(log2(x!))+where+0+%3C+x+%3C+10000
>>
>> -----
>>
>> Isiah Meadows
>> me at isiahmeadows.com
>>
>> Looking for web consulting? Or a new website?
>> Send me an email and we can get started.
>> www.isiahmeadows.com
>>
>>
>> On Sun, Apr 29, 2018 at 4:01 PM, Alexander Lichter <es at lichter.io> wrote:
>> > On 29.04.2018 18:34, Naveen Chawla wrote:
>> >>
>> >> I don't think there's such a thing as "real random" in digital algos,
>> >> just
>> >> "pseudo random".
>> >
>> > You are right. I meant 'unbiased' pseudo randomness here.
>> >
>> >> Apart from card games, what's the use case?
>> >
>> > There are a lot of potential use cases. The best that comes into my mind
>> > is
>> > sampling test data.
>> >
>> >
>> > On 29.04.2018 19:01, Isiah Meadows wrote:
>> >>
>> >> I think this would be better suited for a library function rather than
>> >> a
>> >> language feature. I could see this also being useful also for
>> >> randomized displays, but that's about it. And I'm not sure what an
>> >> engine could provide here that a library couldn't - you can't really
>> >> get much faster than what's in the language (minus bounds checking,
>> >> but the likely frequent cache misses will eclipse that greatly), and
>> >> it's not unlocking any real new possibilities.
>> >
>> > As Tab Atkins Jr. already pointed out it's not about performance
>> > benefits.
>> > It's about how error-prone custom shuffle implementations are/can be.
>> >
>> > _______________________________________________
>> > es-discuss mailing list
>> > es-discuss at mozilla.org
>> > https://mail.mozilla.org/listinfo/es-discuss
>> _______________________________________________
>> es-discuss mailing list
>> es-discuss at mozilla.org
>> https://mail.mozilla.org/listinfo/es-discuss
>
>


More information about the es-discuss mailing list