We showed our latest achievements for quantitative finance markets at the Frankfurt MathFinance conference - See my Feel The Heat post in UnRisk Insight.
The problem in brief: to get insight into the model risk when valuing a complex financial instrument needs valuations across models. Models need to be calibrated to market data and each calibration might need one million valuations of simpler instruments to identify the parameters always checking the goodness of the fit of the model prices with the market prices.
Such single tasks can take hours on a traditional PC.
If those instruments are in portfolios and you need to test in scenarios you would need calculation time counted in weeks or months.
We have developed a clever combination of fast solvers in C++ and implemented parts on GPU systems (NVIDIA Tesla). Parallel execution on the GPU does not like if-then-else constructs, to make CUDA code efficient, it matters that GPU cores like it when they all do the same.
Mathematica 8's CUDA support is a perfect framework for this kind of efficiency tests - our C++ engines are seamlessly integrated into Mathematica and the CUDA support extends the link structure to CUDA implementations.
On top of this all we parallelized on the task level applying Mathematica's built-in parallel computing capabilities to the distribution of valuations across models and other scenarios.
This intelligent combination of coarse and massive fine grain parallelism is not only very flexible it also blazingly fast.
The reduction of 8 hours to 8 sec is possible on a 6 core PC with 1 Tesla C2070 card (448 GPU cores), but adding additional CPU and GPU cores we are fully scalable. We are now able to calculating results in a time that was assessed as impossible.
And we never leave our high level task-oriented language to put things together and just change and optimize on the lower levels.