Benchmarks of Vector and RISC/cache-based parallel supercomputers

  • Speeds for FFTs on 256x256x128 grid, from NAS Parallel Benchmarks http://science.nas.nasa.gov/Software/NPB ). ( More info on benchmarks).

  • T3E-900 is fastest overall, but requires 256 T3E processors to match the 32 processor SX4. Because of Amdahl's law, large complex programs often require a lot of programming work to efficiently use large numbers of parallel processors.

  • Theoretical peak speed of T3E-900 is 900 Gigaflops, but actual speed on this FFT is 13 times slower. This is a common result on cache-based parallel computers, where the bandwidth between the CPUs and memory is a bottleneck. Vector-based supercomputers such as the Cray C90/T90 use a very fast memory-bank architecture which eliminates such bottlenecks.
  • More info on memory bandwidth problems.

    The following table, on the topic of performance per dollar, is taken from http://science.nas.nasa.gov/Software/NPB/Reports/NAS-96-018.fm.html.

    Table 13: Approximate sustained performance per dollar for Class B BT benchmark.

    ---------------------------------------------------------------------------------
          Computer         #      Memory   Ratio    List Price     Performance     Date   
            System        Proc                to      Million     per Million            
                                            C90/1     Dollars       Dollars             
    CRAY T3E               128    64 MB/PE   18.78          5.0      3.76       Nov 96   
    DEC Alpha Server        8       2 GB     2.25         0.58      3.88       Nov 96   
    8400 5/440 (437 MHz)                                                               
    Fujitsu VX              3   512 MB/PE    5.84         1.11      5.26       Nov 96   
    Fujitsu VPP300          6   512 MB/PE   11.56         1.54      7.51       Nov 96   
    Fujitsu VPP700         34   512 MB/PE   60.83         9.98       6.1       Nov 96   
    IBM RS/6000 SP         64   128 MB/PE   13.38         3.52      3.80      Nov 96   
    P2SC node (120 MHz)                                                                
    NEC SX-4/32            32        4 GB   42.57         10.7      3.98       Nov 96   
    SGI Origin2000         26        2 GB    6.73         0.96      7.01       Nov 96   
    (195 MHz)                                                                          
                                                                                       
    ---------------------------------------------------------------------------------