typed array filling convenience AND performance
jwalden+es at mit.edu
Thu Nov 6 16:49:14 PST 2014
On 10/31/2014 08:40 AM, Katelyn Gadd wrote:
> In my testing, even best-case optimized memcpy/memset loops did not
> produce the kind of efficient code you want. There were always bounds
> checks and in some cases I saw other overhead. The set/copy definitely
> weren't vectorized. In the best case I saw bounds checks getting
> hoisted out of the loop body (great!)
It's my impression, across many compilers and languages, that completely correct range analysis (a prerequisite to eliminating many bounds checks) is very, very hard.
When people have used Csmith and similar on gcc and clang, they've found range analysis bugs (or issues in systems producing inputs to range analysis) with ease. These are seasoned compilers where one might think such issues had long since been eliminated.
People find similar issues fuzzing SpiderMonkey. Range analysis has been in SpiderMonkey awhile now, yet the incoming stream of range analysis issues continues unabated. We "trust" range analysis information for DCE purposes and constant folding because a wrong value/behavior is not immediately exploitable. But we've been *very* hesitant to trust it to eliminate bounds checks. The consequences there are memory corruption and arbitrary code execution. When the range analysis information is known buggy, it'd be folly to trust it to remove bounds checks.
I'd also note C/C++ and similar range analysis code usually has the advantage of working with stable, integer-xor-floating types. JS with its double type requires considerably more care to suss out an implicit integer subtype, when double operations can produce integers, integer operations can produce doubles, and negative zero constantly confounds. Range analysis for those languages is easier than it is for JS, and yet they're still buggy.
> I feel like at this point the claim that we don't need better copy/set
> APIs is the 'sufficiently smart compiler' trope all over again, though
> in this case perhaps it is wholly possible and it's just not happening
> because it isn't important enough to VM implementers.
"not important enough" is true, but it requires a big-picture view. Your narrow focus is on typed arrays and native-equivalent code. There's practically a whole web of code dissimilar to that, that also needs optimizing. There are new standard features to implement. There are security issues to fix. There's low-level architecture to get right so that it's possible to build upon basic optimizations and eventually implement bigger ones (such as the vectorization ideas you mention here). We'll get to this eventually, but it has to compete with the other important concerns on the table.
P.S. -- For what it's worth, I think we'll care a bit more about this soon. We may need to do bits of this to permit self-hosting of SpiderMonkey's typed array methods without losing performance, and we really want to make that change soon. Don't give up hope!
More information about the es-discuss