This is a note about how parallelism is implemented in gs2, in response to a question from Darin Ernst. ************************************************************************** (written by Bill Dorland, 14-Jan-1999) Most of the communication is handled by calls to subroutines in the MPI library, which is documented at . There are only a few significant parallel constructs used: send ! Send data from one processor to another receive ! Receive data on one processor that is sent from another broadcast ! Send data from one processor to all synchronize ! Wait for all processors before continuing reductions ! Find global sum, min or max of a quantity that is ! different on each processor. identify self ! Find out number of local processor All calls to MPI subroutines in the code are contained in the module mp. To use MPI, one needs to assure that mp.f90 is a file with the real MPI subroutine calls -- i.e., mp_mpi.f90. On some computers, one may have to include information on the compile line to link to the MPI library properly. On the other hand, if one wishes to run the code on a serial machine that does not have the MPI library available, one should use copy or link to put the contents of mp_stub.f90 into mp.f90. MPI is a very portable, widely used library from Argonne. I recommend using it. SGI/Cray computers generally have MPI libraries available. They also support SHMEM, which is a proprietary set of routines found only on SGI/Cray machines. For some operations (send and receive), SHMEM routines are faster than the analogous MPI calls. Thus, send and receive calls in some places in the code may be replaced by SHMEM calls on computers that support SHMEM. This is accomplished by assuring that the contents of shmem.f90 are a copy from or link to shmem_cray.f90. Otherwise, one may stick to straight MPI calls by using shmem_stub.f90. The SHMEM interface has only been checked on the T3E at NERSC. What issues influence whether to use SHMEM? -- SHMEM may be faster for some runs. In fact, we included SHMEM calls in an attempt to optimize the code. However, despite the fact that the SHMEM puts and gets are faster than the MPI sends and receives, I find that the difference in run time for the cases I have checked tends to be very small. This can happen when the send and receive operations are not the bottleneck in execution. -- SHMEM is less portable than MPI. This is a significant drawback for SHMEM. Although the man pages on the T3E at NERSC indicate that SHMEM calls on all SGI/Cray products have the same interfaces, I believe that Darin is discovering that this is not the case. A symptom of non-portability is: Origin compiler fails to recognize !DIR SYMMETRIC layout directive. SHMEM calls are faster essentially because communicated information is laid out in memory in a specific way, which is specified on the T3E with this syntax. Darin reported that the Origin compiler didn't support this directive. Without figuring out how the special memory layouts are specified, one should expect SHMEM to fail, and therefore one should use the more portable MPI library calls only. For these reasons, I recommend getting the MPI version running on a new computer before worrying about SHMEM. I had hoped that the SHMEM version would run on an Origin 2000. However, this appears to be problematic. I know that the MPI version has been compiled and run successfully on the Origin 2000 at the ACL at LANL, so I think this is the best way to go forward. I'm sorry for the errors in the shmem_stub.f90 file -- I had no computer to test this on after Peter left the program, since he was the one who had accounts at ACL.