parallel arrays and sorting

Norm Rubin nrubin at
Mon Apr 8 07:45:43 PDT 2013

In comparing ParallelArrays (rivertrail) to the cuda thrust library,  I noticed  that Sorting using parallelArrays looks like a missing primitive  operation.

Sorting is  special because the good (high performance) algorithm on a gpu is radix sort, while the good (high performance) algorithm on the cpu is parallel merge sort,  The other way around radix sort of cpu, or parallel merge sort of a gpu is very slow and often worse than serial implementations

Sadly it is well past a jit to take the code for one flavor of sort and transform it into the other,  while  it would be pretty simple for a run-time to pick a good sort, if only it knew that a sort was going on.  Run times would not be required to do  so  here since they can always pick a slow sort.

I know this is a slippery road, since once you add another prim, adding more prims becomes ever easier  but sorting seems pretty important.   And array already has a sort, so adding a version that works on ParallelArrays does not seem so bad.

What do you guys think?

This email message is for the sole use of the intended recipient(s) and may contain
confidential information.  Any unauthorized review, use, disclosure or distribution
is prohibited.  If you are not the intended recipient, please contact the sender by
reply email and destroy all copies of the original message.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the es-discuss mailing list