To achieve this you need to optimize your data structures and algorithms and you might be ably to get significant speed-up by outsourcing special algorithms on GPU co-processors, but you need to distribute your load.
With imperative programming languages (C++, Java, ..), concurrency can only be exploited by writing programs with multiple threads of execution. High-performance multithreading is hard in those languages.
In functional programming languages it is easy.
Mathematica 7's built-in parallel computing represented in its parallel declarative (functional) programming langage is perfect for symbolic, coarser-grain parallelisation. Its extension gridMathematica makes use of pools of network-managed multi-core-multi-compute-node of any scale. It optimizes for the latest hw platforms and you can create minimalist HPC infrastructures.
The valuation of a portfolio:
1. Valuate each portfolio member, an instrument instantiation
2. Accumulate individual instrument prices to obtain the portfolio value
Using a pseudo Mathematica Notation, it looked like this
portfolioValue = Apply[Map[portfolio, Valuate]Accumulate]
where Valuate in real Mathematica Code represents the generic interface to the valuation of ALL deal and instrument types in UnRisk.
To parallelise we changed to
portfolioValue = Apply[ParallelMap[portfolio, Valuate]Accumulate]
That's it? Yes, that is it about. A few lines of code with enormous impact.