Manuel M T Chakravarty
chak at cse.unsw.edu.au
Sun Aug 8 23:00:13 EDT 2010
> We can definitely see how forcing the focal point to the middle makes things a lot easier. Our only question is whether there is enough information to determine if there are elements not being used in the case where the focal point is not centred.
Even if we may not specially optimise for elements that are not used at the moment, the abstract syntax has enough information to add such an optimisation later. (The cue is that the variables corresponding to the unused elements do not occur in the body of the stencil function.)
> For example, we have implemented an algorithm that uses a 36x36 stencil with the focal point in the upper right corner. This would now require specifying a 73x73 stencil, to centre the focal point, where 4033 (73x73 - 36x36) elements are not required. Once elements are dragged into shared memory, a lot of the unused elements will get used up by neighbouring stencils (depending on block size), but there will still be wastage around the edge of the kernel block.
How important is that 36x36 stencil to you at the moment? Is it used in the AutoMap code? I'm asking because it will be hard to do with current approach. At the moment, Accelerate only supports tuples with up to 5 parameters. This is easy to extend, but it is somewhat tedious and I wasn't anticipating to go, say, beyond 10 or 15-tuples. Even the Haskell standard libraries don't predefine class instances etc for more than 15 tuples (although, GHC's code generator can handle up to 100-tuples).
I'm happy to look at stencil functions with large stencils somewhere down the road, but it would simplify matters if we could leave that until later.
More information about the Accelerate