DEGAS 2 | Previous

PERFORMANCE OPTIMIZATION AND COMPARISON

Although efficiency was kept in mind during all stages of the design process, no efforts have been made heretofore to assess and optimize the performance of DEGAS 2. Here, we document the optimization of the code. Some of these steps involved significant code revisions; others were effected by simple changes to source or input files. The end result is that the run time for DEGAS 2 is equal to that of EIRENE within the variations normally experienced in a time-sharing environment.

Strings

The implemented code revisions were motivated by "profiling" the code to determine which subroutines were the most time-consuming. At one point, assignments and comparisons of string variables were using significant fractions of the run time. The strings responsible were being used to describe auxiliary data associated with the atomic physics reactions. The code was modified to compile a list of all of these strings at the beginning of the run and to use an integer pointer into that list for the run-time comparisons and assignments. Because these code changes were pervasive, their impact on the run time could not be documented as carefully as the other improvements noted below. Roughly, however, they resulted in a reduction of about 10 seconds per 1000 flights.

Starting Point

The starting assumptions were the same as those associated with the code as run for the benchmarks to this point (specifically, DEGAS 2 V. 1.8a). The "EIRENE physics" set was used to minimize differences due to the number of collisions and to permit direct comparisons of the variances.

For each code configuration, runs were performed at 1000, 2000, and 4000 flights. The time for the 1000 flight case was subtracted from the other two and the results used to estimate the incremental time required to track 1000 flights. The idea is that production runs would be substantially longer than these (with relative standard deviations on the order of 1%) so that the overhead associated with input and output would be negligible. In a few cases, production length runs (e.g., 80,000 flights) were done; for those, the run time was just taken to be the time consumed divided by 80.

These runs were performed on a Sun UltraSPARC 2 running SunOS 5.5.1 and V. 4.2 of f77 with "-O4" optimization. Again, because this computer is a time-sharing system, these run times are only approximate. The estimated error is about 10%.

Baseline: 136 seconds per 1000 flights

Charge Exchange Rejection

The charge exchange cross section depends on the relative ion-neutral velocity. However, collisions are decided upon using a cross section averaged over a Maxwellian distribution. In the baseline, the ion collision partners are chosen so as to be consistent with the actual cross section using the "rejection technique"; this process entails sampling several ions before finding one which is satisfactory. EIRENE doesn't do this (neither does the original DEGAS). For the first change, charge exchange rejection was turned off. A cursory examination of the results shows no significant impact on the neutral density. However, a stronger effect may be occurring elsewhere (e.g., energy transferred to background species).

Disable CX rejection: 119 seconds

Reduce Number of Scores

One impressive feature of EIRENE is how thoroughly its default operation has been pared down to the minimum necessary for coupling to the fluid plasma codes. In particular, no data on the variances are kept. DEGAS 2 was written with the philosophy that the mean value for a score is meaningless without a corresponding variance, and the two data values are kept together. For this comparison, a short list of variances was requested in the EIRENE input file. These will be needed later.

As presently distributed, DEGAS 2 by default sets up about 14 scores or tallies. This list was reduced to 7 at this step (later, 3 more will be eliminated leaving the neutral density and the 3 scores representing the particle, momentum, and energy transfer to the background species).

Cut number of scores to 7: 95 seconds

Compression of Scores

The first offender in the profiling exercise was the routine responsible for compiling the scores accumulated during a single flight into the global total. As originally written, both the total and incremental scoring arrays were full-sized (roughly the number of zones times the number of scores times the number of background species). However, each flight visits only a small fraction of the problem space, and this routine was spending a significant amount of time adding 0 to 0.

These scoring arrays were modified to contain only the non-zero scores, with an array of pointers mapping their contents back to the corresponding locations in the full-sized arrays (which were still used for the final output stage).

Compressed scores: 49 seconds

Other Changes with Minimal Effects

Further reducing the number of scores from 7 to 4 had little impact on the run, probably because of the use of compression at this point.

DEGAS 2 (and EIRENE) employ a 10 term quadratic representation of all surfaces. In some problems, such as this one, only the linear terms are needed. The inclusion of the quadratic terms was made a compile-time option so that they could be turned off for fully linear geometries. Earlier tests of the impact of this change showed a reduction of about 4 seconds per 1000 flights. However, in this more systematic series of trials, the improvement could not be quantified with certainty.

Variance-Altering Changes

Thus far, all of the changes made lad to a final result which was the same as described previously for the run with "EIRENE physics". Most of the subsequent changes, while reducing the run time, also result in increases in the variance of the results. In a given Monte Carlo calculation, the variance is inversely proportional to the number of flights and, hence, to the run time. So, the performance Figure of Merit is the variance times the run time. An attempt will be made below to quantify this FOM, but we can also compare qualitatively the variance over the most relevant portion of the problem space (near the plate) via the relative standard deviation (rsd). For the baseline case, we'll use the rsd for the D density:

Grab file: spD_density_rsd_ex.ps

Note that below we'll use the rsd for the source rate of D+ due to ionization of neutrals. That rsd was not computed for this baseline run, but should be comparable except for the first one or two zones adjacent to the plate in which the contributions from D2 are significant.

Removal of Suppressed Ionization

The simplest ("analog") Monte Carlo simulation kills off neutrals at their first ionizing collision. However, this makes finding the solution more than one mean free path from the source difficult. The predominant non-analog improvement is to assign a weight to each neutral flight (e.g., initially 1) and reduce that weight at each step to reflect the amount of ionization which should have occurred along the way. As a result, smaller variances are obtained in low probability regions of the problem. Because each flight will thus be tracked for more steps, a run using suppressed ionization will take longer. Whether or not that time is well spent depends on the problem at hand.

EIRENE does not currently use suppressed ionization. For this comparison, we turned this feature off in DEGAS 2.

Without Suppressed Ionization: 15 seconds

Here is the resulting rsd:

Grab file: D__Ion_Source_is_rsd.ps

Collision Estimator

Some of the scores in DEGAS 2, e.g., the neutral density and the D+ source rate, are computed via the track-length estimator. The other scores are compiled using data only at collisions (though most of these can also be done by track-length). Since the collision routines get executed regardless of whether or not the resulting data are used in the scores, we felt that it would be interesting to switch all of the estimators to collision (except density) and eliminate the overhead associated with generating the track-length scores.

Collision Estimator: 10 seconds

Of course, this change increased the variance:

Grab file: D__Ion_Source_col_rsd.ps

Russian Roulette for Molecules

Two of the seven reactions involving D2 and D2+ in DEGAS 2 result in two D atoms. DEGAS 2 normally tracks both of these. EIRENE, however, like the original DEGAS, chooses one of the two to follow and "kills" off the other. This is a simple example of a general nonanalog Monte Carlo technique known as "Russian roulette". Whether the use (or non-use) of this technique improves the variance again depends on the problem. For the purposes of this benchmark, Russian roulette was added to DEGAS 2's molecular dissociation routine. Those scores switched to collision estimator in the previous section were reverted to the track-length estimator for this configuration.

Russian Roulette on D2: 8 seconds

Here is the resulting rsd:

Grab file: D__Ion_Source_rr_rsd.ps

Figures of Merit

We now have four different configurations with which to evaluate this "figure of merit", variance times run time. Presently, there is no single "answer" in these simulations which we can use to compute the FOM. Somewhat arbitrarily, we have selected a region of the problem spanning the width (radius) of this geometry and encompassing most of the integrated D+ source (the selection was made by eyeballing the plot; 83% of the integrated source was included). The rsd for each configuration was divided by that of the baseline and integrated over this region. We then defined the FOM as the product of the 1000 flight run times (quoted above) and the square (to get the variance) of this ratio.

Configuration Seconds / 1000 flights RSD Ratio FOM
Baseline 49 1 49
No Ionization Suppression 15 1.9 54
Collision Estimator 10 4.3 185
D2 Russian Roulette 8 2.3 41

Clearly, using the collision estimator is not a good idea. The effectiveness of suppressed ionization could go either way. However, doing Russian roulette on the molecular products (this run was done without ionization suppression) looks to be the overall winner. Probably not by accident, this configuration is the closest to the default mode of operation for EIRENE.

EIRENE Performance

Of course, the whole point of this portion of the benchmark is to compare the performance of DEGAS 2 against that of EIRENE. Apart from the addition of the rsd output to EIRENE, no other modifications have been made to its default mode of operation. However, the compiler optimization was changed from the "-O3" value specified with the IPP-Garching version to the "-O4" which appears to work well for DEGAS 2 with the new Sun FORTRAN 77 compiler. Over several runs of length 1000 to 10000 flights, the incremental time for 1000 flights is:

EIRENE: 12 seconds

The relative standard deviations computed by EIRENE were consistent with those of the corresponding DEGAS 2 configuration (Russian roulette), although a direct comparison was not made. Here are the fractional differences for the D density and the D+ source relative to EIRENE. Compare these with the corresponding plots from the previous page of the density and the ion source rate.

Grab file: diff_D_den_rr_eir.ps Grab file: diff_rr_eir_sni.ps

So, to within normal variations, the two codes are now running at the same speed. The particular DEGAS 2 configuration should be chosen according to the needs of the problem at hand.

Keep in mind that the original objective of DEGAS 2 was to be faster than DEGAS; the hope was that the code would yield performance comparable to that of EIRENE while retaining flexibility. That objective has been met; the substantial benefits of parallel processing will be realized in the near future. The advantages of dynamic memory allocation of all run-time arrays, another design feature of DEGAS 2, have also been demonstrated during this benchmark. The run-time sizes were: DEGAS 2 - 7 MB, EIRENE - 142 MB. It should be noted that IPP-Garching version of EIRENE used for these benchmarks is significantly older than that currently available from Reiter.

Finally, the amazing result of an almost factor of 20 improvement in the speed of DEGAS 2 attests to the efficacy of profiling and to the impact of even small algorithm changes.

Update on EIRENE Performance

Following some suggestions from Reiter, two modifications were made to the EIRENE runs:

  1. EIRENE's computation of variances is less than optimal. They can be turned off via the input file. Furthermore, unneeded tallies can be disabled. Doing so for this problem yields an incremental time for 1000 flights of:
    EIRENE: 3 seconds
    At this point, EIRENE is substantially faster than DEGAS 2 since turning off its variance computation (which has been optimized) will not have a significant impact. We will revisit this performance comparison while we are doing the benchmarking with a realistic geometry.
  2. By reducing the dimensioning parameters in EIRENE to values more appropriate for this input file, the size of the code was reduced from 142 MB to 55 MB.


DEGAS 2 | Previous