Although efficiency was kept in mind during all stages of the design process, no efforts have been made heretofore to assess and optimize the performance of DEGAS 2. Here, we document the optimization of the code. Some of these steps involved significant code revisions; others were effected by simple changes to source or input files. The end result is that the run time for DEGAS 2 is equal to that of EIRENE within the variations normally experienced in a time-sharing environment.
The implemented code revisions were motivated by "profiling" the code to determine which subroutines were the most time-consuming. At one point, assignments and comparisons of string variables were using significant fractions of the run time. The strings responsible were being used to describe auxiliary data associated with the atomic physics reactions. The code was modified to compile a list of all of these strings at the beginning of the run and to use an integer pointer into that list for the run-time comparisons and assignments. Because these code changes were pervasive, their impact on the run time could not be documented as carefully as the other improvements noted below. Roughly, however, they resulted in a reduction of about 10 seconds per 1000 flights.
For each code configuration, runs were performed at 1000, 2000, and 4000 flights. The time for the 1000 flight case was subtracted from the other two and the results used to estimate the incremental time required to track 1000 flights. The idea is that production runs would be substantially longer than these (with relative standard deviations on the order of 1%) so that the overhead associated with input and output would be negligible. In a few cases, production length runs (e.g., 80,000 flights) were done; for those, the run time was just taken to be the time consumed divided by 80.
These runs were performed on a Sun UltraSPARC 2 running SunOS 5.5.1 and V. 4.2 of f77 with "-O4" optimization. Again, because this computer is a time-sharing system, these run times are only approximate. The estimated error is about 10%.
As presently distributed, DEGAS 2 by default sets up about 14 scores or tallies. This list was reduced to 7 at this step (later, 3 more will be eliminated leaving the neutral density and the 3 scores representing the particle, momentum, and energy transfer to the background species).
These scoring arrays were modified to contain only the non-zero scores, with an array of pointers mapping their contents back to the corresponding locations in the full-sized arrays (which were still used for the final output stage).
Further reducing the number of scores from 7 to 4 had little impact on the run, probably because of the use of compression at this point.
DEGAS 2 (and EIRENE) employ a 10 term quadratic representation of all surfaces. In some problems, such as this one, only the linear terms are needed. The inclusion of the quadratic terms was made a compile-time option so that they could be turned off for fully linear geometries. Earlier tests of the impact of this change showed a reduction of about 4 seconds per 1000 flights. However, in this more systematic series of trials, the improvement could not be quantified with certainty.
Thus far, all of the changes made lad to a final result which was the same as described previously for the run with "EIRENE physics". Most of the subsequent changes, while reducing the run time, also result in increases in the variance of the results. In a given Monte Carlo calculation, the variance is inversely proportional to the number of flights and, hence, to the run time. So, the performance Figure of Merit is the variance times the run time. An attempt will be made below to quantify this FOM, but we can also compare qualitatively the variance over the most relevant portion of the problem space (near the plate) via the relative standard deviation (rsd). For the baseline case, we'll use the rsd for the D density:
Note that below we'll use the rsd for the source rate of D+ due to ionization of neutrals. That rsd was not computed for this baseline run, but should be comparable except for the first one or two zones adjacent to the plate in which the contributions from D2 are significant.
The simplest ("analog") Monte Carlo simulation kills off neutrals at their first ionizing collision. However, this makes finding the solution more than one mean free path from the source difficult. The predominant non-analog improvement is to assign a weight to each neutral flight (e.g., initially 1) and reduce that weight at each step to reflect the amount of ionization which should have occurred along the way. As a result, smaller variances are obtained in low probability regions of the problem. Because each flight will thus be tracked for more steps, a run using suppressed ionization will take longer. Whether or not that time is well spent depends on the problem at hand.
EIRENE does not currently use suppressed ionization. For this comparison, we turned this feature off in DEGAS 2.
Here is the resulting rsd:
Some of the scores in DEGAS 2, e.g., the neutral density and the D+ source rate, are computed via the track-length estimator. The other scores are compiled using data only at collisions (though most of these can also be done by track-length). Since the collision routines get executed regardless of whether or not the resulting data are used in the scores, we felt that it would be interesting to switch all of the estimators to collision (except density) and eliminate the overhead associated with generating the track-length scores.
Of course, this change increased the variance:
Two of the seven reactions involving D2 and D2+ in DEGAS 2 result in two D atoms. DEGAS 2 normally tracks both of these. EIRENE, however, like the original DEGAS, chooses one of the two to follow and "kills" off the other. This is a simple example of a general nonanalog Monte Carlo technique known as "Russian roulette". Whether the use (or non-use) of this technique improves the variance again depends on the problem. For the purposes of this benchmark, Russian roulette was added to DEGAS 2's molecular dissociation routine. Those scores switched to collision estimator in the previous section were reverted to the track-length estimator for this configuration.
Here is the resulting rsd:
We now have four different configurations with which to evaluate this "figure of merit", variance times run time. Presently, there is no single "answer" in these simulations which we can use to compute the FOM. Somewhat arbitrarily, we have selected a region of the problem spanning the width (radius) of this geometry and encompassing most of the integrated D+ source (the selection was made by eyeballing the plot; 83% of the integrated source was included). The rsd for each configuration was divided by that of the baseline and integrated over this region. We then defined the FOM as the product of the 1000 flight run times (quoted above) and the square (to get the variance) of this ratio.
|Configuration||Seconds / 1000 flights||RSD Ratio||FOM|
|No Ionization Suppression||15||1.9||54|
|D2 Russian Roulette||8||2.3||41|
Clearly, using the collision estimator is not a good idea. The effectiveness of suppressed ionization could go either way. However, doing Russian roulette on the molecular products (this run was done without ionization suppression) looks to be the overall winner. Probably not by accident, this configuration is the closest to the default mode of operation for EIRENE.
Of course, the whole point of this portion of the benchmark is to compare the performance of DEGAS 2 against that of EIRENE. Apart from the addition of the rsd output to EIRENE, no other modifications have been made to its default mode of operation. However, the compiler optimization was changed from the "-O3" value specified with the IPP-Garching version to the "-O4" which appears to work well for DEGAS 2 with the new Sun FORTRAN 77 compiler. Over several runs of length 1000 to 10000 flights, the incremental time for 1000 flights is:
The relative standard deviations computed by EIRENE were consistent with those of the corresponding DEGAS 2 configuration (Russian roulette), although a direct comparison was not made. Here are the fractional differences for the D density and the D+ source relative to EIRENE. Compare these with the corresponding plots from the previous page of the density and the ion source rate.
So, to within normal variations, the two codes are now running at the same speed. The particular DEGAS 2 configuration should be chosen according to the needs of the problem at hand.
Keep in mind that the original objective of DEGAS 2 was to be faster than DEGAS; the hope was that the code would yield performance comparable to that of EIRENE while retaining flexibility. That objective has been met; the substantial benefits of parallel processing will be realized in the near future. The advantages of dynamic memory allocation of all run-time arrays, another design feature of DEGAS 2, have also been demonstrated during this benchmark. The run-time sizes were: DEGAS 2 - 7 MB, EIRENE - 142 MB. It should be noted that IPP-Garching version of EIRENE used for these benchmarks is significantly older than that currently available from Reiter.
Finally, the amazing result of an almost factor of 20 improvement in the speed of DEGAS 2 attests to the efficacy of profiling and to the impact of even small algorithm changes.
Following some suggestions from Reiter, two modifications were made to the EIRENE runs: