With the release of gridMathematica 8, which adds more than 500 new features of Mathematica 8 into the shared grid engine, one nice example brings together both ideas - that is driving CUDA hardware in parallel, over the grid.
We use this extensively in UnRisk multiplying speedups from massive fine grain parallelization (with a simple switch UseGPU -> True)and coarse grain parallelization (using ParallelMap, ParallelTable, .. commands). Together with an optimization of our proprietary algorithms we can achieve speedups in the many, many thousands.
UnRisk programming is high level programming in a finance specific language in Mathematica. Consequently it drives CUDA over the Grid.
Our algorithms, over 700.000 lines of C++ code, are numerically optimized for hybrid CPU-GPU systems, but programmers can manipulate them by an amazing few lines of UnRisk/Mathematica code including built-in parallelism. They are bank-proof in valuation and risk management of the most sophisticated deal types and complex portfolios. UnRisk atop Mathematica requires only 250.000 lines of code representing hundreds of instruments, dozens of models and methods and many nasty financial details and objects.