OpenCL has a very similar design with CUDA's "driver API". CUDA also has a "CUDA C" compiler (nvcc) which is basically a preprocessor that takes an enriched C, generates code that uses the "driver API" and uses a real C compiler (GCC or MSVC) to compile the code.

Personally i tried to use CUDA back when my GTX280 was brand new and it was nice. For parallelism-friendly algorithms, such as raytracing, the speedup can be huge (i made a test which had 2-3fps in CPU using C and 560+ fps in CUDA).

I also tried to use it with Lazarus:



This is basically the same test, slightly modified. The raytracer itself is written in "CUDA C" and linked with the Lazarus which does the presentation (it is slower because i'm downloading the image i get from CUDA to the CPU and upload it back on the GPU using Lazarus' OpenGL control while the C version didn't do that part - i don't remember why i did that in Lazarus though... it has been like two years since i wrote that).

However personally today i would use OpenCL instead. It is the proper open standard, more widely supported than CUDA and some configurations (like those from AMD and Apple i think) can use both the CPU and GPU at the same time.