# Accelerate

Ben Lever Ben.Lever at nicta.com.au
Tue Jul 27 21:24:38 EDT 2010

```Hi Manuel,

(I've sent this email to the accelerate mailing list - is this a better destination for these discussions?)

To handle a 5-point stencil, the shape of the intermediate stencil array can be a one-dimensional array of 5 elements. The location of a, b, c, d and e in the stencil array is given by the ordering of the array as specified by the stencil function. For example:

stencilFn :: dimS -> dim' -> dim
stencilFn i (y, x) =
case i of
0 -> (y - 1,     x)  -- a
1 -> (    y, x - 1)  -- b
2 -> (    y,     x)  -- c
3 -> (    y, x + 1)  -- d
4 -> (y + 1,     x)  -- e

One piece of information we did leave out of our specification is that the stencil reduction operator can expect that the intermediate stencil array will be folded over, sequentially, in row-major order. These semantics allow the stencil reduction operator to treat stencil array elements uniquely if it is important to do so.

Your question, however, does highlight an interesting point, which is that the stencil array doesn't really need to be an array of any shape. Our semantics are that the stencil function simply gathers a bunch of elements in some defined order. Perhaps, the specification of the stencil array extent should be removed from the type signature and replaced with simply the number of elements to gather. For example:

stencil
:: (Ix dim, Ix dim', Ix dimS, Elem a, Elem b)
=> Acc (Array dim a)                                        -- Source array
-> (Exp dim  -> Exp dim')                                   -- Function specifying extent of the destination array
-> Exp DIM1                                                 -- Number of elements gathered by stencil
-> (Exp DIM1 -> Exp dim' -> Exp dim)                        -- Stencil specification function
-> (Exp dim' -> Exp b -> Exp a -> Exp b)                    -- Stencil reduction operator
-> Exp b                                                    -- Stencil reduction identity value
-> Acc (Array dim' b)                                       -- Destination array

Do you think this is a good way to handle situations like a 5-point stencil?

To answer your other question about the stencil specification function (Exp dimS -> Exp dim' -> Exp dim) potentially being too general: we agree that it is general but we see stencil specifications being a regular pattern for gathering input array elements, even if they aren't neighbours. Whilst regularity can not be guaranteed statically, if the stencil pattern is regular, there are similar opportunities for locality-based optimisations.

Regards,
Ben.

On 28/07/2010, at 12:21 AM, Manuel M T Chakravarty wrote:

Hi Ben,

Your revised definition seems like a way to get out of having to use nesting.  One point, I'm wondering about is how you like to represent something like a 5-point stencil

a
bcd
e

where some elements of the stencil array are not used.

Cheers,
Manuel

Hi Manuel,

Rami, Trevor and I made a start on a CUDA backend for the stencil function today, and quickly realised that, in its current form, it is non trivial to implement because it allows for nested data parallelism. After some discussion, we arrived at a modified version of stencil that, whilst being more restrictive, should make the backend simpler and also allow for further optimisations. At this stage, we believe it is sufficient for computer vision use cases.

The attached stencil.hs file contains type signatures of proposed modified stencil functions. The key modifications are:

*   Direct access to the stencil array(s) has been removed to avoid the nested data parallelism problem:
*   Instead, the semantics of the stencil function is to apply a fold reduction over each intermediate stencil array to produce an output array element
*   In the case of stencil2 and higher, multiple intermediate stencil arrays are folded together, element-wise, to produce an output array element
*   For stencil2 and higher, the stencil arrays, produced from elements of the source arrays, are of the same extent - this is necessary to support the element-wise folding

It would be good to get your's and other's thoughts on these modifications.

Trevor says he's happy to convey some of the background reasoning behind this proposal in your group meeting tomorrow. If, however, you think it would be useful for us to join the meeting and explain further, please let us know, and we'll definitely come in.

Regards,

Ben Lever
Senior Researcher Engineer
National ICT Australia

Ben Lever
Senior Researcher Engineer
National ICT Australia

NICTA l Locked Bag 9013 l Alexandria NSW 1435
T + 612 8306 0742 | F +612 9376 2027
www.nicta.com.au<http://www.nicta.com.au/> l ben.lever at nicta.com.au<mailto:rami.mukhtar at nicta.com.au>

```