Numerical Algorithms used in GS2
********************************
Greg Hammett, Aug. 25, 1999


The main numerical algorithms used in GS2 are described in the KRT paper
(Kotschenreuther, Rewoldt, Tang, CPC 88, 128 (1995)).

Of particular interest is the implicit method used to solve the gyrokinetic
equation.  The linear gyrokinetic equation is an integro-differential
equation involving f(E,xi,theta,s), the distribution function as a function
of energy, pitch angle, theta, and species index s, with integrals of f
involved in the gyrokinetic Maxwell's equations for the fields.
Kotschenreuther's algorithm is fully implicit, for both f and the fields
(which are related to integrals of f).  A brute force inversion of the
resulting Ntot x Ntot matrix, where 

     Ntot = (N_E * N_xi * N_s + N_fields) *N_theta

would involve Ntot**3 operations for the first inversion, and Ntot**2
operations for the subsequent solutions.  Kotschenreuther's algorithm makes
use of some properties of the equations to break up the inversion into two
parts, one is effectively an implicit solver for the fields and the other
for f (there are no approximations here, the solution is still fully
implicit).  The resulting compute time should scale as

(N_fields*N_theta)**3 for the initial inversion of some matrices, and

C1*(N_fields*N_theta)**2 + C2 * (N_E * N_xi * N_theta * N_s) per time step

for each subsequent time step, where N_fields is the number of fields (1
for electrostatic, 3 for fully electromagnetic), N_theta is the total
number of theta grid points (ntheta*(2*nperiod-1) in terms of the namelist
input parameters), and C1 and C2 are constants.  [I'm not sure if the code
actually follow the N_fields scaling or not...]

This method of breaking up the inversions leads to a very fast implicit
algorithm, and is a pretty neat trick if you ask me...

************************************************************

A general implicit finite difference time advancement scheme for

df/dt = - L f

where L is some linear operator (in our case, f is equivalent to a vector
and L to a big matrix), is

(f(i+1) - f(i))/dt = - L ( delta * f(i+1) + (1-delta * f(i) )

this is like Eqs. 11-14 in the KRT paper, where delta=0 is fully explicit,
delta=1 is fully implicit, and delta=0.5 is a standard time-centered
implicit, 2cd order accurate solution.  GS2 allows different value of delta
for different species (though I've only thought about a constant delta for
all species).  [GS2 also allows delta to be complex, though D&K can't
remember why complex values were explored once, and we will asume it is
real here.]  In terms of the namelist input values for fexpr and fexpi:

	delta = 1 - (fexpr + i * fexpi)

(note that fexpr=fexpi=0 corresponds to fully implicit, and fexpr=0.5
corresponds to a standard time-centered implicit method, and fexpr=.4
corresponds to the slightly off-centered value typically used in the KRT
paper).  For a fully implicit delta=1,

f(i+1) = 1/(1+ dt * L) f(i)

We can break up f into eigenvectors of L, and consider each eigenvector
independently.  Each eigenvector will satisfy

f(i+1) = 1/(1+ dt * i * omega_L) f(i)

where i*omega_L is the corresponding eigenvalue.  This is numerically
stable for all dt, for real or damped omega_L, though there can be a
numerical instability if omega_L corresponds to a real instability and
dt*i*omega_L is close to -1, i.e. if dt is not small enough to resolve the
unstable modes.

For delta=0.5, the amplification factor will be:

                   [1 - dt * i * omega_L /2 ]
A = f(i+1)/f(i) =  --------------------------
                   [1 + dt * i * omega_L /2 ]

Which is again numerically stable for all dt (as long as dt is small enough
to resolve any growing modes).  An advantage of centered schemes is that
for real omega_L, the magnitude of A remains 1 exactly, and no artificial
damping is introduced.  Note however that for extremely damped modes, with
omega_L -> - i*infinity, the amplification factor A will approach -1 and
the solution will just oscillate (though without amplification, so any
unstable modes will eventually dominate).

For general delta, the algorithm remains stable for all dt as long as delta
>= 0.5 (and again, as long as dt is small enough to resolve any growing
modes).

The fastest growing mode will eventually dominate (assuming dt*omega_L is
small enough that the theoretical fastest growing mode remains the
numerical fastest growing mode).  We recognize the solution for f(i+1) as a
Taylor series expansion of the exact solution:

f(t(i+1)) = f(t(i)) exp(-i*omega_L*t)

Depending on exactly how the code measures the frequency (from numerically
evaluating (df/dt)/f), it should be able to measure the frequency omega_L
with an accuracy of |dt*omega_L| or better.  The advantage of this implicit
method is that while |dt*omega_L| needs to be small to measure the fastest
growing mode accurately, there can be other eigenmodes with high
frequencies or very strongly damped with |dt*omega_L| >> 1 that we don't
have to resolve with a small dt.

For a time-centered implicit method (delta=0.5), one only needs
dt*Re(omega)<0.5 and dt*Im(omega)<0.5 in order to determine growth rates to
5% accuracy (assuming that the fastest growing mode is dominant).

One cautionary example is the following.  Consider a case where there is an
unstable ITG mode at low Re(omega), and a slightly faster growing
ballooning mode at high Re(omega).  The initial value code should select
out the fastest growing mode, but if the time step is such that
dt*Re(omega) is large for the ballooning mode, then the implicitness will
reduce the effective numerical growth rate of that mode and make the ITG
mode appear to be faster growing...

?? Someone could do a little analysis here to check the criterion a little
more accurately, and see if the (df/dt)/f measurement can be translated
numerically depending on the value of delta to be virtually exact.

************************************************************

The bakdif parameter in the namelist controls upwind differencing.  Leave
it at zero to get the second-order spatial differencing method used in the
KRT paper.

The bakdif parameter is a second "implicit" parameter in some sense, and
was added after the KRT paper was written.  Dorland and Hammett thought a
little bit about it in spring of 1999.

     bakdif = 0.0 ! no additional upwinding-type implicitness
     bakdif = 1.0 ! fully "upwinded" spatial differencing

Dorland writes:

"In practice, MK and I almost always use bakdif = 0.  For some nonlinear
runs, I have used bakdif = 0.1, which is small enough that the RH effect is
fully recovered for the grids that I was using.  Early on in the nonlinear
development, I used bakdif = 0.5 in order to try to get some nonlinear runs
to go through robustly so I could check other assumptions/algorithmic
issues.

The bakdif scheme is not exactly upwind differencing, but it is pretty
close to it -- about as close as the implicit scheme is to textbook
implicit schemes."

************************************************************

GS2 uses Gaussian integration rules to try to get higher accuracy.  The
possible values of ngauss and negrid at present are:

 ngauss=10  ! # of passing pitch angles. ngauss=1-6, 8, 10, or 12
            ! # of trapped particle pitch angles = ntheta/2
 negrid=8   ! # of energy grid points.  negrid = 3-12, 14, or 18.

The above restricted set of possibilities for the values of ngauss and
negrid are because Gaussian integration tables have been entered only for
those values.  A straightforward upgrade to the code could allow arbitrary
values of ngauss and negrid, perhaps using second-order accurate
integration for simplicity.  Or such an upgrade could still use Gaussian
integration, and employ a subroutine to evalute the weights for an
arbitrary number of points.  But far above marginal stability probably
doesn't require much velocity space resolution.  And even a small amount of
collisions will probably reduced the needed velocity resolution a lot.