OpenCL shared memory and work group size
Hi Folks,
me again with another OpenCL question to ponder. Don't be shy. I know there are a lot of "lurkers" out there itching to get stuck in to these problems.
Working on the cumulative addition problem I hit a bug that was caused by shared memory.
Reviewing the excellent MacResearch tutorials, sure enough tutorial 4 it states that work items in a work group can access the same shared memory.
I was adding up 32 numbers and with a work group size of 16, each group could not see the other group's shared memory. I need ALL items to collaborate on the same shared memory to solve the problem.
So, I upped the work group size to 32 and hey presto! everything works splendidly and the world is happy place once again.
My question is, for some problems will that not be a limitation? What if I want to use this algorithm to add 2048 numbers? For me to continue to use shared memory I must up my work group size as the problem demands? Otherwise I must use global memory.
See my earlier post on Cumulative Addition for more info on the algorithm used.
Any thoughts? Advice as always much appreciated.
Cheers,
Max




Covered in the CUDA doc...
Looks like they also cover this in this CUDA doc...
http://developer.download.nvidia.com/compute/cuda/sdk/website/projects/scan/doc/scan.pdf