Levenshtein, BKTree & OpenCL, yay or nay?
Hi,
I'm writing a fairly processing heavy app for nearest neighbor lookup of strings.
To speed it up I'm using a BK-Tree written in pure C (with quite a few other optimizations, such as iterative lookup, instead of recursive, to reduce function calls, etc.), which reduces processing quite a bit. (don't worry, I did quite a lot benchmarks, before doing any random and likely premature optimizations ;) )
However I figured I might want to make use of the GPU for the stuff that it is particularily good for compared to the CPU. which would most likely be the levenshtein algorithm in which most of the overall processing time is still spent anyway.
So I did a quick search and found an article about implementing a levenshtein-like automata for the GPU:
http://mohamedfahmed.wordpress.com/2010/04/19/approximate-string-matching-on-gpgpus/
(which a speed boost of about 48x compared to the CPU, yay!)
Not quite what I'd need (as on branch nodes I need the actual distance, not a boolean), but I think implementing levenshtein in OpenCL shouldn't be impossible, but rather straight forward, no?
Anyway, I'm not asking for any implementation help directly for the levenshtein algorithm, but rather asking for some comments on the overall usefullness of using OpenCL for this.
Potential problems I see (disclaimer: never worked with OpenCL or GPU in general though):
I do not know in advance of which tree nodes I need to calculate the levenshtein distance to my search string. So I'd most likely need to fork the levenshtein calculation to the GPU right on demand. And probably an even bigger problem: I'd need the CPU to wait for the GPU to finish, as it needs the distance to decide in which sub-tree to continue tree traversal :(
Single calls of the levenshtein algorithm on the CPU are pretty darn fast already, so I fear I might not get much benefit from the GPU if for every single call I'd have to wait for data disposal to the GPU, then calculation on it, and last but not least data receival upon calculation, while I could just have called it on the CPU itself.
Has anybody here had any first party experience with stuff like this?
Thanks in advance!



