OpenCL FFT

Apple posted OpenCL FFT sample code this last month. When running the default parameter test suite I'm getting "test failed" on my setup (an ATI Radeon 4870 in and 8 core MacPro). It's returning performance numbers of 8.5 and 13.3 GFlops for the 64 and 1024 one-dim FFTs, respectively, but the L2 errors are huge.

I haven't dug into the code at all, but what kind of results are you seeing? Similar errors? Better numbers?

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Re: OpenCL FFT

This is what I get on an NVIDIA GTX285

This is what I get on an NVIDIA GTX285

NVIDIA
GeForce GTX 285
Performance Number GFlops achieved for n = (64, 1, 1), batchsize = 8192 (in GFlops/s): 96.6763
Test passed (n=(64, 1, 1), batchsize=8192): out-of-place Test: rel. L2-error = 1.156323 eps (max=1.566783 eps, min=0.836761 eps)
Performance Number GFlops achieved for n = (1024, 1, 1), batchsize = 8192 (in GFlops/s): 135.934
Test passed (n=(1024, 1, 1), batchsize=8192): out-of-place Test: rel. L2-error = 2.161172 eps (max=2.335403 eps, min=2.014934 eps)
Performance Number GFlops achieved for n = (1048576, 1, 1), batchsize = 4 (in GFlops/s): 104.324
Test passed (n=(1048576, 1, 1), batchsize=4): out-of-place Test: rel. L2-error = 4.618438 eps (max=4.620894 eps, min=4.616305 eps)
Performance Number GFlops achieved for n = (1024, 512, 1), batchsize = 8 (in GFlops/s): 127.76
Test passed (n=(1024, 512, 1), batchsize=8): out-of-place Test: rel. L2-error = 3.463954 eps (max=3.467798 eps, min=3.459424 eps)
Performance Number GFlops achieved for n = (128, 128, 128), batchsize = 1 (in GFlops/s): 108.349
Test passed (n=(128, 128, 128), batchsize=1): out-of-place Test: rel. L2-error = 3.227776 eps (max=3.227776 eps, min=3.227776 eps)
Performance Number GFlops achieved for n = (16384, 1, 1), batchsize = 4 (in GFlops/s): 0.512414
Test passed (n=(16384, 1, 1), batchsize=4): in-place Test: rel. L2-error = 3.149706 eps (max=3.155182 eps, min=3.145259 eps)
Performance Number GFlops achieved for n = (32, 2048, 1), batchsize = 8 (in GFlops/s): 2.22822
Test passed (n=(32, 2048, 1), batchsize=8): in-place Test: rel. L2-error = 2.954279 eps (max=2.960348 eps, min=2.942896 eps)
Performance Number GFlops achieved for n = (4096, 64, 1), batchsize = 4 (in GFlops/s): 3.28517
Test passed (n=(4096, 64, 1), batchsize=4): in-place Test: rel. L2-error = 3.304146 eps (max=3.306858 eps, min=3.299225 eps)
Performance Number GFlops achieved for n = (64, 32, 16), batchsize = 1 (in GFlops/s): 0.305888
Test passed (n=(64, 32, 16), batchsize=1): out-of-place Test: rel. L2-error = 1.995782 eps (max=1.995782 eps, min=1.995782 eps)

Yeah okay -- those are more

Yeah okay -- those are more like the numbers I was expecting.