OpenCL FFT
By JeffreyEarly at Thu, Dec 10 2009 10:24pm |
Apple posted OpenCL FFT sample code this last month. When running the default parameter test suite I'm getting "test failed" on my setup (an ATI Radeon 4870 in and 8 core MacPro). It's returning performance numbers of 8.5 and 13.3 GFlops for the 64 and 1024 one-dim FFTs, respectively, but the L2 errors are huge.
I haven't dug into the code at all, but what kind of results are you seeing? Similar errors? Better numbers?




Re: OpenCL FFT
This is what I get on an NVIDIA GTX285
This is what I get on an NVIDIA GTX285
NVIDIA
GeForce GTX 285
Performance Number GFlops achieved for n = (64, 1, 1), batchsize = 8192 (in GFlops/s): 96.6763
Test passed (n=(64, 1, 1), batchsize=8192): out-of-place Test: rel. L2-error = 1.156323 eps (max=1.566783 eps, min=0.836761 eps)
Performance Number GFlops achieved for n = (1024, 1, 1), batchsize = 8192 (in GFlops/s): 135.934
Test passed (n=(1024, 1, 1), batchsize=8192): out-of-place Test: rel. L2-error = 2.161172 eps (max=2.335403 eps, min=2.014934 eps)
Performance Number GFlops achieved for n = (1048576, 1, 1), batchsize = 4 (in GFlops/s): 104.324
Test passed (n=(1048576, 1, 1), batchsize=4): out-of-place Test: rel. L2-error = 4.618438 eps (max=4.620894 eps, min=4.616305 eps)
Performance Number GFlops achieved for n = (1024, 512, 1), batchsize = 8 (in GFlops/s): 127.76
Test passed (n=(1024, 512, 1), batchsize=8): out-of-place Test: rel. L2-error = 3.463954 eps (max=3.467798 eps, min=3.459424 eps)
Performance Number GFlops achieved for n = (128, 128, 128), batchsize = 1 (in GFlops/s): 108.349
Test passed (n=(128, 128, 128), batchsize=1): out-of-place Test: rel. L2-error = 3.227776 eps (max=3.227776 eps, min=3.227776 eps)
Performance Number GFlops achieved for n = (16384, 1, 1), batchsize = 4 (in GFlops/s): 0.512414
Test passed (n=(16384, 1, 1), batchsize=4): in-place Test: rel. L2-error = 3.149706 eps (max=3.155182 eps, min=3.145259 eps)
Performance Number GFlops achieved for n = (32, 2048, 1), batchsize = 8 (in GFlops/s): 2.22822
Test passed (n=(32, 2048, 1), batchsize=8): in-place Test: rel. L2-error = 2.954279 eps (max=2.960348 eps, min=2.942896 eps)
Performance Number GFlops achieved for n = (4096, 64, 1), batchsize = 4 (in GFlops/s): 3.28517
Test passed (n=(4096, 64, 1), batchsize=4): in-place Test: rel. L2-error = 3.304146 eps (max=3.306858 eps, min=3.299225 eps)
Performance Number GFlops achieved for n = (64, 32, 16), batchsize = 1 (in GFlops/s): 0.305888
Test passed (n=(64, 32, 16), batchsize=1): out-of-place Test: rel. L2-error = 1.995782 eps (max=1.995782 eps, min=1.995782 eps)
Yeah okay -- those are more
Yeah okay -- those are more like the numbers I was expecting.