Details on how I used the NAS benchmarks.

The speeds reported are from the NPB1 bencharks for the class A FT problem, which is for FFTs on a 256x256x128 grid (class B uses a 512*256*256 grid and so would scale to a larger size computer). The figure I've put together is a composite figure taken from a couple of different sources. In essence, it is a selected merger of the NB2 results at:
and the NPB1 results at

Most of the actual numbers came out of tables in I have converted their normalized results to MFLOP/sec rates, using their quote of 196 MFLOP/sec for the Cray Y-MP on this FT problem. The NEC SX4 results for the class A FT are missing, so I took the results from the class B FT (which is on a larger grid).

Results for a T3E-900 are not directly given for the NBP1 class A FT. Instead, I used the results for the T3E-900 on the NBP2 class A FT problem, as given in the NPB2 section of the NAS web site, and corrected for the ratio of NPB1 to NPB2 compute time also given there for the T3E-900. The NPB2 benchmarks are written in MPI, with minimal modifications to run on each computer. The NPB1 FFT benchmarks are allowed to make heavy use of vendor-written optimized machine code. For the class A FT problem on the T3E-900, NPB1 is about 2-4 times faster than NPB2 (depending on the number of processors).