parallel arrays and sorting

Kevin Gadd kevin.gadd at gmail.com
Thu Apr 11 11:42:34 PDT 2013


One simple-ish real world example of a use case that can benefit from a
parallel sort:

It's common in games and other realtime rendering scenarios to want to sort
your queue of rendering operations by various attributes, in order to
minimize the number of hardware state changes and allow you to batch up
operations.

http://realtimecollisiondetection.net/blog/?p=86 is an article that
describes an approach used to generate 64-bit 'sort keys' for drawing
operations that was used in some current-gen console games. At present, I
use a similar technique to sort drawing operations in my buffers, and more
importantly, I do all the sorting and preparation operations in parallel.
For complex scenes I could imagine it being quite common to want to
leverage parallel machinery (or even GPU hardware) to sort the thousands of
individual drawing operations in a given scene before clustering them up
into larger hardware operations.

The proposed compromise of providing a function that maps the objects being
sorted to 'sort keys' seems a suitable choice for this kind of scenario -
the draw calls can have a sort key implicitly (like described in that blog
post), or you can trivially manufacture one (that's what I do). This is
definitely a use case where only being able to sort primitives would not be
adequate. However, a sort primitive that operates on a pair of arrays would
also suffice - Norm calls this 'sort by key', and the .NET BCL exposes this
as an overload of Array.Sort. It's quite useful and it seems like it might
be feasible to expose this, even if only the key-sorting occurs in parallel
(and the value sorting can't because the values are ordinary JS objects).


On Thu, Apr 11, 2013 at 11:35 AM, Norm Rubin <nrubin at nvidia.com> wrote:

> Good set of questions about sort,  thanks for looking at this carefully.
> You guys have  been great about responding to my emails.****
>
> ** **
>
> ** **
>
> ** **
>
> One sorting application that I would expect is physics particle
> simulation, where the code  might sort objects based on distance from the
> viewer. This could be used so that  far away objects get rendered smaller.
> ****
>
> ** **
>
> Another might be  computing a histogram  over a visual image – sort the
> data first so nearby values will go into the same bucket, without needing
> atomic synchronization.****
>
> ** **
>
> Sorting with an optional compare seems ok  to me- even if implementations
> can only optimize for  some  comparisons ****
>
> ** **
>
> One  interesting  question you brought up as  part of sorting by key is
> pretty tricky.****
>
> ** **
>
> In general parallel data arrays  would like to be stored in transposed
>  order from classic cpu styles.****
>
> We usually call this array of structures or structures of arrays****
>
> Do we sort  an array of objects each with multiple fields stored  one
> object after another or do we sort multiple arrays,  with all the values of
> field1 followed by all the values of field2 etc ****
>
> ** **
>
> I’d hope we allow  implementations to reorder the data within a
> parallelArray as they like without needing to expose the layout to
> developers.  Thrust, on the other hand, pushes the layout  to the developer
> and thus has a significant set of transpose routines.****
>
> ** **
>
> Just for a point of reference there are  8  versions of sort in the thrust
> library. As  you can see, Thrust never aimed to be a minimal set it just
> gained operations as applications appeared. AMD has a similar library
> called bolt which uses the same interface.****
>
> Two different parallel  libraries with the same routines does suggest that
> this set is enough  to do useful work.****
>
> ** **
>
> ** **
>
> Internally thrust notices  that the compare is  <  on primitive  types and
>  uses radix sort (on the gpu).****
>
> ** **
>
> Thurst includes:****
>
>   Stable and unstable forms****
>
> Ascending only or with a comparison function****
>
> Single array or a pair of keys and values ****
>
> ** **
>
> **1)      **Sort(array)  - sorts the array into ascending order, not
> guaranteed to be stable****
>
> **2)      **Sort (array)– with a comparison function****
>
> **3)      **Sort by key (keys, values) returns both the reordered keys
> and  the reordered values, sorted by the array keys. This one is kind of
> confusing so here is an example****
>
> ** **
>
> // an example of key sorting****
>
> #include <thrust/sort.h<http://docs.thrust.googlecode.com/hg/sort_8h.html>
> >****
>
>   const int N = 6;****
>
>   int    keys[N] = {  1,   4,   2,   8,   5,   7};****
>
>   char values[N] = {'a', 'b', 'c', 'd', 'e', 'f'};****
>
>   thrust::sort_by_key<http://docs.thrust.googlecode.com/hg/group__sorting.html#ga2bb765aeef19f6a04ca8b8ba11efff24>(keys,
> keys + N, values, thrust::greater<int><http://docs.thrust.googlecode.com/hg/structthrust_1_1greater.html>
> ());****
>
>   // keys is now   {  8,   7,   5,   4,   2,   1}****
>
>   // values is now {'d', 'f', 'e', 'b', 'c', 'a'}****
>
> ** **
>
> ** **
>
> **4)      **Sort by key with a  comparison function****
>
> ** **
>
> 5-8 the stable forms ****
>
> ** **
>
> ** **
>
> Based on the thrust primitives, one  parallel array sort  might be ****
>
> ** **
>
> Sort an array of objects ****
>
> Besides the array the arguments might be ****
>
> - a flag indicating if the sort must be  stable****
>
> - an optional user compare function that defaulted to < on primitive types
> ****
>
> ** **
>
> ** **
>
> *From:* Herhut, Stephan A [mailto:stephan.a.herhut at intel.com]
> *Sent:* Thursday, April 11, 2013 1:09 PM
> *To:* Norm Rubin; es-discuss at mozilla.org
> *Subject:* RE: parallel arrays and sorting****
>
> ** **
>
> Rick is travelling, so let me chime in.****
>
> ** **
>
> We have discussed this back and forth but have not come to a conclusion.
> Generally, we agree that adding a sort primitive makes a lot of sense, in
> particular as the Array object in JavaScript already has a sort method.
> Also, as you too mentioned, implementing an efficient sort as a library
> function without knowing the details of the parallel hardware used is
> difficult, to say the least. So sort ticks all the boxes to become a
> primitive.****
>
> ** **
>
> The other, and arguably more difficult question, is what a sort method
> should look like. If we take JavaScript’s existing Array.sort, the sort
> method would get an (optional) comparator function. However, using a
> comparator would preclude the use of radix sort. ****
>
> An alternative would be to implement sorting of primitive types only. This
> brings back more choice in sort algorithms but limits use. For such a
> design, we considered a function as optional argument to sort that, given a
> value from the ParallelArray to be sorted, returns a key used for
> comparison, which again needs to be a primitive. This would at least enable
> sorting of objects by a field and, at some runtime cost, sorting of general
> data. ****
>
> ** **
>
> The tradeoff between these two approaches, and probably other designs, is
> hard to judge without knowing what sort is used for. So we decided to wait
> for some good use cases before deciding on a specific design.****
>
> ** **
>
> Sorry, no answers only further questions.****
>
>   Stephan****
>
> ** **
>
> *From:* es-discuss-bounces at mozilla.org [
> mailto:es-discuss-bounces at mozilla.org <es-discuss-bounces at mozilla.org>] *On
> Behalf Of *Norm Rubin
> *Sent:* Monday, April 08, 2013 7:46 AM
> *To:* es-discuss at mozilla.org
> *Subject:* parallel arrays and sorting****
>
> ** **
>
> In comparing ParallelArrays (rivertrail) to the cuda thrust library,  I
> noticed  that Sorting using parallelArrays looks like a missing primitive
>  operation.****
>
> ** **
>
> Sorting is  special because the good (high performance) algorithm on a gpu
> is radix sort, while the good (high performance) algorithm on the cpu is
> parallel merge sort,  The other way around radix sort of cpu, or parallel
> merge sort of a gpu is very slow and often worse than serial implementations
> ****
>
> ** **
>
> Sadly it is well past a jit to take the code for one flavor of sort and
> transform it into the other,  while  it would be pretty simple for a
> run-time to pick a good sort, if only it knew that a sort was going on.
> Run times would not be required to do  so  here since they can always pick
> a slow sort. ****
>
> ** **
>
> I know this is a slippery road, since once you add another prim, adding
> more prims becomes ever easier  but sorting seems pretty important.   And
> array already has a sort, so adding a version that works on ParallelArrays
> does not seem so bad.****
>
> ** **
>
> What do you guys think?****
>
> ** **
>
> ** **
> ------------------------------
>
> This email message is for the sole use of the intended recipient(s) and
> may contain confidential information.  Any unauthorized review, use,
> disclosure or distribution is prohibited.  If you are not the intended
> recipient, please contact the sender by reply email and destroy all copies
> of the original message. ****
> ------------------------------
>
> _______________________________________________
> es-discuss mailing list
> es-discuss at mozilla.org
> https://mail.mozilla.org/listinfo/es-discuss
>
>


-- 
-kg
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.mozilla.org/pipermail/es-discuss/attachments/20130411/404b9b2e/attachment-0001.html>


More information about the es-discuss mailing list