As the timings show, actual FLOP performance depends a lot on memory access speeds and how much of the calculation can be pipelined, etc. Thus simply counting FLOPS is not always accurate. Nevertheless, it's still useful to count FLOPS as a guide. Livermore loops counted division and square root operations as 4 flops (circa 1990?). However, more typical at least of many CPUS circa 2000-2007 is that 1 division or 1 square root take about 10 times as long as a multiply or add. FPGA (Field Programmable Gate Arrays) can be designed to do better. The latest GRAPE designs used FPGAs.