History of changes to the gryffin code. This code is the result of teamwork, and it is important for us to communicate our changes to each other here. Comments are in reverse order. Start each new comments with a date and your name. This file may also be useful to document semi-official "release dates", i.e., a date which can be used to the CVS checkout command to obtain a version of the code which had been fairly stable for a while and had been known to work fairly well. For example: cvs checkout -D "1994-01-27 11:00:00 GMT" gryffin would get the first stable version of the code. One can find other release dates by searching through this file for "release date". ************************************************************************** 1999-08-14 1 AM EDT Dorland Bug fixes In the course of doing long full physics runs (trapped electrons + general geometry + impurities), Dave Ross noticed some problems with restarts. These bugs are fixed here. The affected files are step.f90 and gryffin_redist.f90. Three bug fixes: 1. The multiple domain decomposition algorithm did not work properly for iperiod = 3 with impurities. The problem was in gryffin_redist.f90. 2. There was an error in the 3+1 T_parallel equation, since the value of q_perp was not being zeroed out properly after the array was used for storage in other equations. Only step.f90 was affected. 3. The bounce-average collision operator did not treat the k_y = 0 components of the trapped electrons correctly. Only step.f90 was affected. The third problem is potentially important, and needs to be investigated further. ************************************************************************** 1999-06-17 9 AM EDT Dorland Minor changes Four minor changes: 1. Upon restart, iphi00 value is taken from restart input file rather than from save file. 2. New input variable for ETG runs. 'pfac' = B_T**2/n_19 = lambda_Debye**2/rho_e**2. 3. iflr = 0 option added => drift kinetic equations to compare with Jenko. 4. Minor bug fix in naming of results files for non-NetCDF output. ************************************************************************** 1999-03-16 22:25 GMT Hammett Makefile works for LANL Origin2000 The Makefile now works on both the LANL and Princeton Origin2000's. On hecate.princeton.edu the standard compilers work, while at LANL you have to first setup MIPSpro_7.3_beta_19990208. See todo.txt for how to track down the problem with LANL's default compilers further. I also fixed a minor bug in elliptic_int.f. ************************************************************************** 1999-03-08 04:20 GMT Hammett NAG-free version for Origin2000 This version compiles on the hecate.princeton.edu Origin2000 without needing NAG. Two new routines: nag_sub.f90 and elliptic_int.f. I wrote my own linear congruential random number generator that exactly reproduces the NAG random number generator. This allows us to set our initial conditions using the same (pseudo-)random sequence as we did with NAG, and thus we should be able to reproduce old runs exactly (to within roundoff). At present, the Makefile still uses NAG for other platforms besides the Origin2000. (FFT's require either the SGI/Cray Scientific Library or NAG, and so NAG is still needed to build on a DEC alpha.) Bill fixed a problem that showed up when multiple .res files were concatenated. ************************************************************************** 1999-03-05 05:15 GMT Hammett multiprocessing fix Release date Switched to using mp_mpi_r8.f90 when compiling with -r8 (default real is 8 bytes), such as on the Origin2000. Now we get the same answer (to within roundoff) on multiple processors or 1. Affected files: Makefile, mp_mpi_r8.f90. ************************************************************************** 1999-03-03 08:00 GMT Hammett Princeton Origin2000 port! Release date This is a fairly stable version of the code, that now works on several computers including on the Origin2000 with MPI. I haven't tried shmem on the Origin2000 yet. The execution speed on the Origin2000 is comparable to the T3E (and the compiler is much faster on the Origin2000). It is fully functional on the Origin2000, including the postc postprocesser. It may need a little more work to find the right libraries to compile with on the LANL Origin2000. I also fixed a few minor things to get it to work on the PPPL DEC Alphas. I've updated the help file ~/gryffin/doc/itg.hlp. There is a good list of things that would be useful to do someday in ~/gryffin/doc/todo.txt. I've checked that lintest0 and nltest0 (and some adiabiatic electron cases) agree well on the T3E, J90, Origin2000, and DEC Alpha, but haven't checked all of the testsuite cases. ************************************************************************** 1999-02-23 02:11 GMT Hammett Fix calls to cl_getarg Fixed subtle error which manifested itself on the Origin-2000. In the subroutine to get command-line arguments, cl_getarg(k, arg, len, ierr), len should be an output variable, not an input variable. Affected routines: command_line_alpha.f90 command_line_nag.f90 command_line_posix.f90 command_line_unix.f90 compare.f90 itg_data.f90 nc2nc.f90 nc2res.f90 postc.f res2nc.f90 Also made minor portability improvements to Makefile. ************************************************************************** 1999-02-22 09:14 GMT Hammett Upgrades to netcdf, and other changes. There are a lot more changes from the previous version than I had intended, but I think these changes will be useful. The main purpose was to get netcdf i/o working again, but other changes were made to get postc working with the 2-D parallelized version, improve portability etc. Parallel netcdf appears to have a bug, and in any case, it is useful to have a serial interface to netcdf for the many computers that don't have parallel netcdf. So this version provides a serial netcdf interaction. The option to use parallel netcdf still exists by recompiling the code with the serial_io=.false. (or making it a namelist control variable). I changed all of the input routines (wread, bwread, wreadnc) so that they allocate the necessary arrays if they haven't been allocated yet (this is needed when being called from postprocessing routines like res2nc). All of the io routines now open and close their files themselves, and so take the filename (instead of a fortran i/o unit) as an argument. In the process of these changes, I also changed funnel to define a generic funnel20 routine, and merge in calls to c2r, to simplify the usage of the funnel module in io_ascii.f90, io_binary.f90, and io_netcdf.f90 The script io_binary.sh can be used to generate io_binary.f90 from io_ascii.f90. The res2nc, nc2res, and nc2nc utilities can be used to convert between various formats of restart files (*.nc, *.res, *.resb). Examples: res2nc file1.res file1.nc res2nc file1.resb file1.nc compare.f90 is a useful utility to compare all of the double precision arrays in two netcdf files. It gives one better control over the magnitude of relative and absolute errors to report. Examples: compare file1.nc file2.nc compare file1.nc file2.nc 1.e-6 1.e-10 The latter form only reports errors that are relative errors larger than 1.e-6 and absolute errors bigger than 1.e-10. Other info from compare.f90: ! this is hopefully a generically useful utility, and does not ! link in any gryffin-specific code. ! ! this code won't notice if there are new variables in the second netcdf ! file that aren't in the first. Also, it doesn't check integer variables. ! ! Note: Some variables can appear to have a large relative ! error, but are actually zero to within roundoff. ! This compare program accounts for this in many cases, such as ! arrays calculated by FFT's, by using the avg value of an ! array as a lower bound in calculating the relative error. ! But there are still other cases where a large relative error ! might be reported. An example of this is in runs with adiabatic ! electrons, where the particle flux is by definition zero ! (though small roundoff errors may make it slightly non-zero). ! Thus, although the relative error for fluxi and fluximn might ! be reported as order unity in some cases, fluxi and ! fluximn should still be negligible in comparison with ! the ion heat flux Q_i (=wpfx). Use ncdump to check this. Other changes: xitgc.c90 uses more portable sh shell, consistent with other xitgc.* scripts Makefile made a little more portable. Now compiles on t3e, j90, Dec Alpha, Linux, Origin-2000. (Runs properly on t3e and j90, still working on other platforms.) Uses the test_os script to determine the platform type. And a few other minor changes, for example: * removed duplicate variable definitions caught by the nag f95 compiler. * It appears that a module name can not be the same as a subroutine name (or at least not an external subroutine name in a separate library). I was getting error messages about the "grid.f90" module conflicting with the grid subroutine in NCAR, so I rename grid.f90 to gryffin_grid.f90, and changed a bunche of "use grid" statements to "use gryffin_grid". This affected 74 subroutines in 23 *.f90 files. * to reduce compiler warning messages, replaced some "goto" statements with "if(skip_code) goto" statements. * added netcdf_mod.f90, a simplified f90 module wrapper fro some netcdf routines, to make io_netcdf.f90 a little easier. itg.f90 fix minor logic bug for nprnt=1 (print results every time step) linear.f90 1999/02/04 Dorland fixed an electromagnetic restart bug. linear.f90 make interface to FFT work and table generic length (*). plots.f90 minor changes to compile with postc on j90 postc.f minor changes, to use portable command_line routine, and change to new interface to io_netcdf, io_ascii, io_binary routines (the filenames are now passed to these routines, which open and close the files themselves). nlps_par.f90 linear.f90 changed interface to FFT's to use generic "assumed size" lengths (*) for table and work arrays. constants.f90 Added stuff using fortran-90's "kind" commands for portable selection of 8-byte double-precision. ************************************************************************** 1999/01/17 03:51:15 GMT Dorland Parallel (electromagnetic) version 1998-10-27 EST Dorland Parallel (electrostatic) version With much thanks to Peter Liu, Gryffin is now ported to machines with MPI or SHMEM libraries. There are no substantive changes to the input files. Here, the steps required to compile and run the code are documented, as well as some of the details of the parallelism, and a list of changes in functionality and capability. A few important points are marked with '!!'. A preliminary version of the electromagnetic terms of Snyder and Hammett is also included in this version, but should not be used without direct consulation with one of those two individuals first. The recommended way to compile on the T3E is with MPI and SHMEM. However, for runs with N_processors >= 2*md AND iperiod = 3, do not use SHMEM. !! After checking out the code, go to the src directory. Then: 1. Links !! To compile on the T3E with the MPI and SHMEM libraries: % make t3e_shmem Or, to compile on the T3E with only MPI: % make t3e To compile on the C90: % make c90 2. Load the right NetCDF module: !! T3E version: % module load USG netcdf3.5a C90 version: % module load netcdf 3. Make: !! % make itgc Running: C90: Running the code on the C90 is the same as before. T3E: To run on the T3E, you must decide how many processors to use before running. You also have to execute a special command which allows the NetCDF I/O to be done in parallel. This command is automatically issued if you use the xitgc script. Number of processors: If the number of processors (NPROC) satisfies NPROC <= (nffty-1)/3 + 1 using Fortran rules for arithmetic, the work is parallelized over only the ky index. For md <= NPROC < 2*md, there should be no significant improvement in parallel performance relative to NPROC = md. For NPROC = N*md, where N >= 2, the work is parallelized over both ky and kx. No significant improvement in parallel performance is expected for non-integer values of N when N >= 2. That is, NPROC = 2*md will probably give about the same performance as NPROC = 3*md - 1. To see how many processors are available, use the mppview command. Further information about the run-time environment on the T3E may be found at [http://www.nersc.gov]. As before, the code produces a file called runname.nc. This file can be processed with postc, which has not been ported to the T3E because NCAR graphics are not available there. Developmental notes: Overview: The code is parallelized over k_theta (ky) for NPROC < 2*md, and both k_theta and k_x for NPROC > 2*md-1. The style of programming is "data parallel." The FFT's needed for the nonlinear terms are in kx and ky; necessary redistributions are done so that each transform is done on processor. Thus, there are no calls to multiprocessor FFT's. Almost all communication occurs when the data is redistributed. The moments and the fields are distributed over the processors. Most of the control arrays and quantities like rkperp are replicated on all processors. The communication is handled by MPI through a Fortran-90 interface (mp_mpi.f90) written by Peter Liu. There is a dummy file (mp_stub.f90) for all these calls that is used when MPI is not available. In future development, all calls to MPI should be directed through the mp module, to keep the code as portable as possible. Most of the layout directives and local processor information are found in Peter's gryffin_layouts.f90. This module needs to be used whenever one makes reference to array sections, processor number, etc. The module gryffin_redist.f90 contains the routines that initialize the redistributions of the data for the nonlinear terms (and for boundary conditions if NPROC > 2*md-1). Differences in the command line syntax are handled by command_line_x.f90. The function iargc() returns the number of arguments, and the subroutine cl_getarg (mnemonic: command_line_getarg) returns the arguments. The NetCDF interface is a parallel interface on the T3E and a serial one on the C90. At present, only the NetCDF output routines are enabled. The binary and ascii formatted output routines have not been parallelized. Restarts work in exactly the same way as in the past. The nmlstrip script had to be changed to be compatible with the T3E. Details: 1. MPI The mp_mpi.f90 module was written by Peter Liu. It serves as a convenient interface to the MPI library, which has been ported to many machines. More information about MPI can be found at [http://www.mcs.anl.gov/mpi/index.html]. Peter has used Fortran-90 to overload many MPI calls and to simplify the calling syntax. The available routines from mp are: init_mp finish_mp broadcast sum_reduce sum_allreduce max_reduce max_allreduce min_reduce min_allreduce send receive barrier A few variables made available by this module are: nproc iproc proc0 A short description of each follows. nproc: This is the integer number of processors being used in the run. It is assumed to be unchanging as the code runs. iproc: This integer is the number of the local processor. 0 < iproc < nproc-1. proc0: This logical variable is true if the local processor has iproc == 0. init_mp: This should be called before any other MPI calls. An MPI initialization routine is called, and the variables iproc, nproc, and proc0 are assigned. finish_mp: This should be called after all other MPI calls. Typically, it will appear at the end of the main block of code. broadcast: This subroutine broadcasts information from one processor to all other processors. The syntax is: call broadcast(a[, i]) where "a" can be any one of: character, integer, one-dimensional integer array, real, one-dimensional real array, complex, one-dimensional complex array, logical, one-dimensional logical array and the optional argument "i" is the number of the processor from which the data is broadcast. If the routine is called with only one argument ("a") then the data is broadcast from processor zero (iproc == 0). !! Calls to broadcast should be executed on all processors. The MPI broadcast command causes the non-sending processors to wait until the data arrives. The broadcasting processor continues on as soon as the data is sent. Correct example: Assign a variable i on one processor and broadcast the value to the remaining processors. if(proc0) i = 5 call broadcast(i) Incorrect example: if(proc0) then i = 5 call broadcast(i) endif In this latter example, processors other than processor zero do not know that the data is coming, and will not receive it. sum_reduce: Sum across processors and send the result to one processor. The syntax of this call is call sum_reduce(a, i) where "a" is any of integer, one-dimensional integer array, real, one-dimensional real array, complex, one-dimensional complex array and "i" is the number of the processor that receives the answer. The routine sum_allreduce does the same thing, except that the result is sent to all the processors. The syntax is: call sum_allreduce(a) Examples of the correct usage of sum_allreduce may be found in diagnostics.f90. The thing to remember is that the part of the data in the sum that is local (on-processor) should be summed first, and then the final results gathered up. [For specific implementation notes about global sums in gryffin, search for "sum intrinsic" below.] max_reduce, min_reduce, etc., all work analogously to sum_reduce, except complex data types are not allowed. send: This routine sends data from one processor to another. The syntax is: call send(a, dest[, tag]) where "a" is any one of: integer, one-dimensional integer array, real, one-dimensional real array, complex, one-dimensional complex array, logical, one-dimensional logical array and "dest" is the processor number that the data is being sent to, and tag is something I am not familiar with. Peter says it is redundant. receive: This is the companion call to send. It has the syntax: call send(a, dest[, tag]) where "a" is as with the send command, and "dest" is the processor number that the data is coming from. !! Send and receive calls should be made in pairs. barrier: This routine causes all processors to wait until all other processors are at this point in the code. It is rarely needed. There are three occasions that require it: a. at key points in the netcdf calls (see below), b. when lots of data is being sent to one processor, and c. when race conditions may occur. For examples of (2) and (3), see init.f, where all data is sent to processor zero for writing, and gryffin_redist_shmem, where race conditions would otherwise be a problem for the rflags and sflags variables. 2. Layouts in Gryffin The data layouts are stored in a derived type variable, m_lo (mnemonic: moment layout). There are shorthand names available, as well: m_low Lowest local value of m index (ky) m_high Highest local value of m index (ky) m_alloc Appropriate upper m index for allocating arrays n_low Lowest local value of n index (kx) n_high Highest local value of n index (kx) n_alloc Appropriate upper n index for allocating arrays The routines available in this module are: init_gryffin_layouts proc_id idx_local and the data available is: m_layout_type m_lo Details: init_gryffin_layouts: This routine initializes the m_lo structure, and thereby determines how the data is laid out across processors. It has one integer argument, which should be equal to md = (nffty-1)/3+1. The m_layout_type has the information needed for communications and parallel coding. The pieces are: m_lo%iproc: integer, present processor number m_lo%md: integer, argument passed to initialization call m_lo%llim_world: integer, overall lower limit to "m" index (= 1). m_lo%ulim_world: integer, overall upper limit to "m" index (= md). m_lo%llim_proc: integer, lower limit of "m" index for this processor. m_lo%ulim_proc: integer, upper limit of "m" index for this processor. m_lo%ulim_alloc: integer, upper limit of "m" index for this processor in all allocation statements. m_lo%blocksize: integer, number of "m" indices on most processors. !! Why does the m_lo%ulim_alloc variable appear? This is needed to prevent zero-extent arrays from being passed as arguments of procedures, since this is an error in Fortran 90. m_lo%ulim_alloc is the larger of m_lo%ulim_proc and m_lo%llim_proc. Thus, arrays allocated with this index for an upper limit will never have zero size, even when NPROC > md. !! Warning: sum intrinsic: Because we are using this construct, note that the use of intrinsics such as sum, max, etc., may give wrong answers when used over the "m" index, since some parts of the arrays may have garbage in them. These operations should be explicitly carried out with Fortran-77 style (indexed) loops. proc_id: This integer function returns the number of the processor that has a particular piece of data on it. The syntax is: j = proc_id(m_lo, m, n) where m_lo is an initialized variable of the type m_layout_type, "m" is the m index of the data of interest, and "n" is the n index. This function is particularly useful when used to specify indices in send, receive and broadcast calls. idx_local: This logical function is true if the data indexed by "m" and "n" is on the local processor. The syntax is: logical :: local local = idx_local(m_lo, m, n) idx_local is useful for making sure that a section of code that refers to only one part of the distributed data is carried out only on the processor that has that data. In Gryffin, it is often used when handling the m = 1 mode (ky = 0 mode), which is often qualitatively different from the other ky modes. !! Warning: A hard-to-find, but very easy bug to insert into the code is neglect to specify the lower bounds of a passed or declared assumed-size array. Fortran-90 will assume the lower bound is 1, but the different processors expect lower bounds of m_lo%llim_proc. 3. Nonlinear layouts. The nonlinear layouts are found in gryffin_redist. Presently, the two directions not being transformed at a given point in time will be lumped together and distributed among the NPROC processors. The ulim_alloc idea is used here, too. 4. Data redistribution. The data is redistributed in redistribute. redistribute.f90 has several gather, scatter and fill routines overloaded for easy. All coding is based on routines originally written by Peter Liu. The generalizations and initialization routines were written by Bill Dorland. 5. Nonlinear terms The nonlinear terms are evaluated in nlps.f90. On the T3E, this file is a link to nlps_par.f90, which uses the gryffin_redist module. The logic is exactly the same as in nlps_serial; the only substantive changes are in the way the data is redistributed. !! Before using nlps_serial, a call to subroutine 'reality' needs to be included. The structure can be copied from the nlps_par.f90 module. !! The variable nredist can be set to be larger than 2 if there are more than two sizes of arrays expected. Right now, 2 is appropriate since in general kdpass and ldb are not equal. 6. I/O !! Only processor zero reads the input namelist. The results are broadcast to the remaining processors. Thus, if you add a variable to the input namelist, you must broadcast it after the namelist is read. This is important, and a common source of error. It may be possible in general to have all processors read the input files, but for portability, we chose this method. There is a portable version of NetCDF available on mcurie.nersc.gov. Here are some notes about it from Greg Hammett, who had good interactions with rkowens@nersc.gov. ------------------------------------------------------------------ Using NetCDF on the Cray T3E: # Set up netcdf environment variables: !! module load USG netcdf3.5a # link the code with the netCDF library: f90 options... $NETCDF # interpose the "global FFIO layer" between the system I/O and the # executable: !! assign -F global f:netcdf_file_name see coding requirements at the end of http://www.nersc.gov/~rkowen/netcdf/usage.html more info at http://www.nersc.gov/~rkowen/ http://www.nersc.gov/~rkowen/netcdf http://www.nersc.gov/~rkowen/netcdf/usage.html man netcdf ------------------------------------------------------------------ Each processor reads and writes its own section of distributed data to/from the file. Only processor zero writes the arrays that are replicated across all processors. All processors read the replicated arrays. !! Restarts work as before. That is, if you just finished a run called 'run' and wish to continue the run as 'run_1', you execute: % cp run.in run_1.in In run_1.in, change nread = 0 to nread = 1. Then, type % ln -s run.nc run_1.ncp % assign -F global f:run_1.ncp % ./xitgc [-n nproc] run_1 It would be convenient to have a script to handle this common operation. !! Things that were broken in the port: 1. Correlation functions are not calculated. 2. It is no longer possible to specify mlo and mhi -- i.e., there is always a k_theta = 0 mode in the system. 3. Binary and ASCII output files are not available. 4. The post-processor was not ported. !! Things that were fixed in the port: 1. All the testsuite cases work now. The problem with the previous .f90 code was apparently related to uninitialized arrays. !! Things that were noted in the port: 1. The l_right variable is handled in possibly inconsistent ways in some diagnostics. !! Things that were changed in the port: 1. The FFT's in kparfft are done in a different order. !! New files: command_line_nag.f90: Works with NAG f90 compiled codes command_line_unix.f90: Works on the C90 command_line_posix.f90: Works on the T3E gryffin_layouts.f90 gryffin_redist.f90 io_ascii.f90 io_binary.f90 io_netcdf.f90 mp_mpi.f90 mp_stub.f90 nlps_par.f90 (replaces old nl.f90 for parallel runs) nlps_serial.f90 (same as old nl.f90) !! Unexpectedly strongly changed files: io.f90 nmlstrip At the same time of this upgrade, the code was brought more in line with Fortran 90 standard by placing more subroutines in modules. This resulted in the following new files: bounce_avg.f90 bounds.f90 constants.f90 diagnostics_arrays.f90 dmix.f90 field_eq.f90 fields.f90 fields_init.f90 linear.f90 nlps_par.f90 nlps_serial.f90 nonlinear.f90 redistribute_mpi.f90 redistribute_shmem.f90 and these files are no longer part of the distribution: baphi.f90 moved to bounce_avg.f90 kapav.f90 moved to bounce_avg.f90 bcon.f90 moved to bounds.f90 connect.f90 moved to bounds.f90 filter.f90 moved to linear.f90 flr_terms.f90 moved to linear.f90 kparfft.f90 moved to linear.f90 nl.f90 became nlps_par.f90 and nlps_serial.f90 nonlin.f90 moved to nonlinear.f90 nonline.f90 moved to nonlinear.f90 pertb.f90 moved to fields_init.f90 poisson.f90 moved to field_eq.f90 ************************************************************************** 1998/08/28 20:57:00 GMT Beer first step at Pfirsch-Schluter closures. Adding residual flow w/modified gf closure to include Pfirsch- Schluter terms, some new diagnositcs, and a restart bug fix. Not extensively debugged. Used for Varenna '98 w/minor changes commented in step.f90 changed routines: Makefile convert.f90 diagnostics.f90 init.f io.f90 itg_data.f90 plots.f90 postc.f step.f90 ************************************************************************** 1998-06-26 10:30 PM EDT Dorland Minor changes related to f90 Yesterday and today, I tried compiling the code using different f90 compilers to see if I could isolate the problem related to multiple species cases, but found only a few minor irregularities in variable declarations. Evidently the Cray f90 compiler doesn't care if a variable is declared twice in a given procedure. Other compilers complained. None of the changes should affect the answers obtained at all. In the gryffin/geo directory, I added a file called Notes. In this file, I recorded the steps I had to take to get eiktest to compile on hydra.pppl.gov. I also removed a lot of old files that weren't being used anymore from the geometry CVS distribution. I have not figured out what is wrong with the code when run for multiple ion species. ************************************************************************** 1998-06-11 8:20 PM EDT Dorland Upgrade to allow dynamic arrays Upgrade to use dynamic array allocation. This is a (good) beta version. It is beta because it doesn't handle the multiple-species test case correctly, and because it is slower than it should be. This is a major revision of gryffin. There are two obvious changes: 1) Arrays are now allocated dynamically. 2) The subroutines have been rearranged into Fortran-90 modules. There are a few details that need to be taken care of yet, as described below. For most uses of the code, this version is adequate. The old version has been moved to the directory gryffin/src/old98. Since we want the best performance possible, and are less interested in reusing most parts of this code, I did not make the code fully modular. Also, I didn't want to change the code beyond recognition for other developers. Most variables are still in one data module, called itg_data.f90, which replaces itg.cmn and itg.par. The previous version has been moved to gryffin/src/old98. The relationship between the files from the previous version and the present version is: Old file New file Comments -------- -------- -------- aborter.f aborter.f No change average.f *deleted* Not used in previous or present version. baphi.f baphi.f90 uses itg_data bcon.f bcon.f90 uses itg_data ceil.f graphics.f90 part of graphics module cnplot0.f graphics.f90 part of graphics module cnplotk.f graphics.f90 part of graphics module compare.f compare.f not upgraded -- broken. connect.f connect.f90 uses itg_data courant.f nl.f90 part of nl module cplogp.f cplogp.f90 no dependencies critical.f itg.f90 part of itg module dm_calc.f itg.f90 part of itg module echoin.f *deleted* absorbed into init.f ecollis.f step.f90 part of step module eikcoefs.f *deleted* see gryffin/geo filter.f filter.f90 uses itg_data finish.f graphics.f90 part of graphics module fitpack.f fitpack.f no dependencies floor.f graphics.f90 part of graphics module flr.f flr_terms.f90 part of flr_terms module flrcow.f *deleted* absorbed in flr flrinit.f flr_terms.f90 part of flr_terms module geometry.f geo.f90 uses itg_data hmplot.f plots.f90 part of plots module hmplot2.f plots.f90 part of plots module hmplotc.f plots.f90 part of plots module hmplotk.f plots.f90 part of plots module init.f init.f uses itg_data, io, flr_terms, in, nl input.f in.f90 uses itg_data itg.f itg.f90 uses itg_data, in, flr_terms, nl, step, diagnostics itgc.f itgc.f uses itg_data, io kapav.f kapav.f90 uses itg_data kenrgyp.f *deleted* not used in previous or present version kparfft.f kparfft.f90 uses itg_data kxspacec.f nl.f90 absorbed into k_space kxspacece.f nl.f90 absorbed into k_space kyspace.f nl.f90 absorbed into k_space kyspacee.f nl.f90 absorbed into k_space maps.f graphics.f90 part of graphics module mnmx.f mnmx.f90 no dependencies nicer.f graphics.f90 part of graphics module nlpmp.f *deleted* never upgraded from slab code! nlpsc.f nl.f90 uses itg_data, flr_terms nlpsce.f nl.f90 absorbed in nlpsc nonlin.f nonlin.f90 uses itg_data, nl, flr_terms nonline.f nonline.f90 uses itg_data, nl nsumavp.f nsumavp.f90 uses itg_data pcons.f pcons.f90 uses itg_data penrgyp.f *deleted* not used in previous or present version pertb.f pertb.f90 uses itg_data poisson.f poisson.f90 uses itg_data postc.f postc.f uses itg_data, graphics, plots, io posten.f diagnostics.f90 part of diagnostics module preenr.f diagnostics.f90 part of diagnostics module prstep.f diagnostics.f90 part of diagnostics module radstub.f *deleted* not used in previous or present version real_comp.f convert.f90 no dependencies rpm2.f *deleted* not used in previous or present version saplot.f *deleted* not used in previous or present version setzer.f graphics.f90 part of graphics module tc_out.f itg.f90 part of itg module timeavp.f timeavp.f90 uses itg_data timestep.f step.f90 uses itg_data, flr_terms volflux.f diagnostics.f90 part of diagnostics module volkflux.f diagnostics.f90 part of diagnostics module volsq.f diagnostics.f90 part of diagnostics module volsqk.f diagnostics.f90 part of diagnostics module wframe.f *deleted* not used in previous or present version wgrad.f *deleted* not used in previous or present version wpunch.f io.f90 part of io module wread.f io.f90 part of io module wread1.f ? part of compare; not upgraded yet xmcfft.f xmcfft.f no change xspacec.f nl.f90 absorbed into real_space xspacece. nl.f90 absorbed into real_space yspace.f nl.f90 absorbed into real_space yspacee.f nl.f90 absorbed into real_space -- convert.f90 no dependencies -- diagnostics.f90 uses itg_data, flr_terms -- flr_terms.f90 uses itg_data -- graphics.f90 no dependencies -- in.f90 uses itg_data -- io.f90 uses itg_data, convert -- itg_data.f90 main data block. replaces itg.par, itg.cmn -- nl.f90 uses itg_data, flr_terms -- plots.f90 uses itg_data, graphics -- step.f90 uses itg_data, flr_terms itg.cmn itg_data.f90 no dependencies itg.par *deleted* Of course. The complete list of source files in gryffin/src of the present distribution is: File Comment ---- ------- aborter.f90 aborts with traceback and user-supplied message baphi.f90 calculates bounce-average of phi bcon.f90 enforces boundary conditions compare.f broken; should compare output from two runs connect.f90 initializes connected FFT's for kparfft, iperiod = 3 convert.f90 converts real to complex, complex to real (for backwards compatibility) cplogp.f90 takes a log of a vector with protection against zero. diagnostics.f90 diagnostic routines for itgc filter.f90 viscosity-like terms for itgc fitpack.f splines flr_terms.f90 calculates FLR corrections for Phi geo.f90 reads geometry information from file prepared by eiktest if igeo = 1 graphics.f90 ncar graphics interfaces in.f90 set defaults, read namelist for gryffin init.f initialize itgc run io.f90 I/O for data. Includes formatted, unformatted, NetCDF, although ONLY UNFORMATTED AND NETCDF I/O ARE WORKING AT PRESENT. itg.f90 main subroutine of gryffin + ancillary calls, array allocation itg_data.f90 main data block for gryffin, with array allocation routine itgc.f main program for itgc kapav.f90 Calculate pitch-angle averages of n_e, _b kparfft.f90 Calculates along the field line derivates mnmx.f90 returns min and max values of a vector and user-supplied values nc2res.f Converts NetCDF file to old .res format. I have not checked this to ensure it is working with this revision. nl.f90 Evaluates a nonlinear term for gryffin. Also includes Courant condition routines nonlin.f90 Calls nl for each ion nonlinearity nonline.f90 Calls nl for each electron nonlinearity nsumavp.f90 Calculates time averages for each m, n mode pcons.f90 Checks particle conservation pertb.f90 Sets up initial perturbation for itgc plots.f90 Mid-level graphics routines for gryffin. (contour plots, etc.) poisson.f90 Solve quasineutrality equation for Phi postc.f Main post-processor for gryffin res2nc.f Converts old .res file in NetCDF format. I have not checked this to ensure it is working with this revision. step.f90 Takes a Runga-Kutta time step, does electron collisions timeavp.f90 Get time average of vector xmcfft.f Plug replacement for mcfft Makefile GNU makefile In order to run postc from a .res, .bres, or .nc file produced before this revision, you need to add the variables malias and nalias, or skip over correlation function section. I have not checked the new version for the cases nltest1 in testsuite.980601. I have checked all the others. With one exception, the results agree to roundoff, as long as you pay attention to the new input variable "nfreq", which in the old version was effectively set to the value consistent with nez. In the testsuite, nez was always 105. Therefore, if you choose nfreq to be nstp/105 using fortran integer rules, you will get the same answer. The exact rule from the old code was nfreq = (nstp-1)/nez + 1. The exception is lintest3, the case with multiple ion species. The new version of the code works for this case only in the debugger...pretty annoying. I am trying to track this last item down, but I want to check in the lion's share of the changes now. The new version is not as fast as it should be. It ranges from 30% slower to a factor of 2 slower on the testsuite cases. There are some optimizations that can be undertaken, particularly in nl.f90, to make sure that the bank conflicts are minimized. Restarts are still a little clunky. It would probably be good to keep only one .nc file, which gets larger as the run is continued. Right now, "nfile" tells postc how many .nc or .res files to look for. There is another variable in post.in (runname.pin) called nsteps. You can ignore it for non-restarts. For restarts, it should be an estimate of the total (including all restarts) number of timesteps/nfreq, ************************************************************************** 1998-05-30 7:30 EST Dorland Bug fix for some general geometry cases BUG FIX FOR GENERAL GEOMETRY. The magnetic shear variable "shr" was not automatically set to the right value for general geometry runs that used numerical equilibria (iflux=1). The variable "shr" is used to define omega_de for trapped electrons, and kx for both electrons and ions in nonlinear runs. Thus, linear runs with trapped electrons in general geometry, and nonlinear runs in general geometry that were carried out with the code previous to this date should be checked. If the value of "shr" that is found in runname.in is close to the equilibrium magnetic shear value, the results are probably ok. Otherwise, the runs should be redone. To be precise, runs with igeo = 1 and iflux = 1, together with lin=0 or epse not equal to zero should be checked. Details: In general geometry, the magnetic shear is determined by the equilibrium data. The magnetic shear is needed for the definition of omega_de, defined in Eq. 3.15 of Beer's PhD thesis. The general geometry definition of omega_de may be inferred from Eqs. 3.14 -- 3.16. Magnetic shear also determines kx from theta_0 and k_theta, according to Beer's Eq. 4.25 (see the discussion before Eq. 4.42). The quantity kx is used only in nonlinear runs. Before this bug fix, the value of "shr" used to define omega_de and kx was the value found in the input file runname.in. This value would not typically be the same as the value associated with the equilibrium. Modified Files: gryffin/src/geometry.f gryffin/geo/et.f90 ************************************************************************** 1998-05-14 23:45 EST Beer Switched to netcdf for main output file Incorporated routines written by Phil Snyder to use netcdf for the main output file (which was *.res, and now is *.nc). He also wrote a utility to turn old .res files into .nc files (res2nc) and vice versa (nc2res). This change should be relatively transparent, as long as netcdf is available. On the a-machine, this is accomplished by "module load netcdf" before making the code. If netcdf is not available, the option to write (and read) binary remains (binary=.true.). Changes: Makefile init.f itgc.f postc.f xitgc xpostc New files: wreadnc.f wpunchnc.f nc2res.f res2nc.f Note that as of last checkin, f90 is default compiler. ************************************************************************** 1998-04-12 17:00 EST Dorland Geometry routines separated from main code To make porting of the code easier, the geometry routines have been removed from the main code. Here is how the new arrangement works: Assuming you have working versions of the codes: The geometry routines and sample input files are found in the geometry subdirectory of the gryffin distribution. Prepare the file eik.in for eiktest. Run eiktest to get a file called eik.out. Put either the real eik.out or a link to it in the same directory with your itgc input file. If you like to keep good records, put eik.in and eik2.out (or links to them) into the same directory with your itgc input file. Set igeo=1 in the runname.in input file for gryffin. Run gryffin with xitgc runname and the xitgc script will copy eik.in, eik.out and eik2.out from the eiktest run into runname.gin, runname.geo and runname.geo2 for future reference, if any subset of these files (or links to them) are found in the same directory as the itgc input file. The input file for eiktest is eik.in. Typical runs for itgc will have the control parameters: bishop = 4 iflux = 0 nperiod >= 1 + int((x0+pi)/(2*pi)), where x0 is set in runname.in. ntheta >= (ld-1)*pi/x0 irho = 2 itor = 1 local_eq = .t. delrho = 1.e-3 equal_arc = .t. isym = 1 ismooth = 0 together with the physics parameters: rhoc == normalized minor radius of flux surface of interest, as before s_hat_input = rho/q dq/drho qinp = q shift = d R_0/drho akappa = elongation akappri = dk/drho tri = triangularity tripri = dtri/drho Rmaj = R_0 of the LCFS R_geo = Rmaj for most applications The values of qsf, shr, and eps are ignored if igeo = 1. I believe that if igeo = 1 and epse = 0, the trapped electron model is not used. If epse is not zero, then the trapped electron model is included. However, the actual value of epse is not used. NOTE: setting nperiod too small may cause itgc to bomb mysteriously. As documented elsewhere, eiktest can also read EFIT, TOQ, JSOLVER, CHEASE and VMOMS output files with the right controls in eik.in. To get working versions of the code: When you check out a version of the code, you will find that there is a subdirectory of the main directory called geometry. If you wish to run with igeo=1, you will need to go to geometry and execute make eiktest There may be minor problems with the Makefile as it comes out of the CVS library at the moment, because I haven't updated the configuration scripts. Depending on what has been checked into CVS, it may have the paths to the "utils" directory incorrect. It should be "utils", not "../utils". If this command fails, you may need to go into geometry/utils and type configure followed by the make command. I will clean this mess up before too long. If you can't get a compiled copy of eiktest, ask bdorland@ipr.umd.edu for help. ************************************************************************** 1997-09-04 03:10 GMT Hammett minor improvement to Makefile to eliminate annoying spurious compiler warnings. ************************************************************************** 1997-09-02 14:30 GMT Beer reverted to release of 1997-08-27 ************************************************************************** 1997-09-02 13:10 GMT Beer temporary check-in, version used for APS '96 Temporarily checked in the following four files so code can reproduce linear runs in APS '96 paper: eik2.inc, init.f, geometry.f, eikcoefs.f. There are two bugs in these geometry routines which will be fixed in future releases. Nonlinear runs in this paper do not have the reality condition enforced, which changes runs only slightly. ************************************************************************** 1997-08-27 06:00 GMT Beer general geometry trapped electrons Added numerical bounce averages and kappa integration weights in general geometry. The only file affected is init.f. The coding is not beautiful, but it works most of the time. We're putting this in CVS now to avoid holding up other code updates. Once the most recent geometry routines are checked in, this code should be able to reproduce curves in our APS '96 paper. ************************************************************************** 1997-08-21 23:50 GMT Beer fixed kperp rho variation and minor changes Benchmarks of this version in cfs /001768/testsuite/testsuite.970823 Changed rkperp2 to include variation of B in kperp rho: init.f, geometry.f, itg.f. Changed connect.f to allow 2pi * integer length boxes with connected FFT boundary condition. ************************************************************************** 1997-08-20 21:50 GMT Hammett Minor changes Minor changes to Makefile, compare.f, wread1.f, and itg.in. There should be no difference in any output. compare.f was broken, now fixed. wread1.f used by compare. This is a copy of wread which reads only up to the first major data array, the density array. Makefile cleaned up a little bit. itg.in added a note to clarify that the width of the extended simulation region should be a multiple of 2*pi. This insures that the m=n=0 mode is properly mapped to the constant component in in the extended domain as well. ************************************************************************** 1997-08-15 17:35 GMT Hammett Removed some obsolete files Release date Benchmarks of this version in cfs /001768/testsuite/testsuite.970819 I've removed some obsolete files from gryffin/src which haven't been used in a long time. They will still show up if an old version is requested. They are also in the gryffin/src/old directory just in case someone wants to browse through them. gryffin/src/old94 contains even older files which are no longer used. The deleted files are: cfinish.f copy.f craydum.f craymake craymakep craymakerw craypost ddx.f flrinitrw.f flrrw.f goonc itgp.cmn kparfftp.f kxspace.f kxspacecp.f kyspacep.f makefile makepost nlpm.f nlps.f nlpscp.f oplot.f penrgy.f plots.cmn plots.f prtime.f radial.inc smplot.f spread.f sundum.f timesteprw.f tmm.f tridagc.f width.f wprint.f xspacecp.f yspacep.f ************************************************************************** 1997-08-15 15:50 GMT Hammett Moved from RCS to CVS Release date The present source code is still identical to the previous version from 1996-09-24 20:00 GMT. It has only been moved from an RCS version control system to the CVS version control system (and some minor directory reorganization begun). All previous versions which were in RCS are still accessible in the CVS repository. For example, to get one of the older versions described below which had been in RCS, do: setenv CVSROOT /afs/pppl.gov/u/qpliu/cvsroot cvs checkout -D "1996-05-31 11:33 EDT" gryffin The RCS library /afs/nersc.gov/u/p10/u1768/itgc/RCS/ has not been changed since the "1996/09/24 20:00 GMT" changes noted below. The only change is that the main directory has been renamed from itgc to gryffin, and most of the source files are in gryffin/src, except for a few documentation files in gryffin/doc. ************************************************************************** 1996/09/24 14:00:00 EDT Dorland Updated geometry routines 1996/09/24 20:00 GMT should be used as the date instead (some changes were slightly after 14:00 EDT) Improved the geometry routines (geometry.f, eikcoefs.f) to bring them up to date with what is in the CVS NT library. The old option iflux=0 is now iflux=-2, and is not preferred. One should instead use iflux=0 if one is interested in specifying a flux surface in terms of a few parameters, and iflux=1 if one is interested in using a real equilibrium. There are a few new input variables in the "stuff" namelist. Along with the previous set, they are used if iflux=0: ssdel Shafranov shift (Delta) shift d Delta/d rho (same as before) shiftpri d shift/d rho akappa elongation (k, same as before) akappri d k /d rho (same as before) akappri2 d k'/d rho tri triangularity (tri, same as before) tripri d tri/d rho (same as before, but now used) tripri2 d tri'/d rho dIdrho d I /d rho, where I is a flux-surface quantity defined by B = I grad zeta + grad zeta x grad Psi I think that dIdrho should only enter in the grad B drift calculation. However, at the present time, it enters into both the curvature and grad B drift calculations. The set of include files is different now. itgc needs: itg.cmn itg.par eik.par eik.inc eik2.inc ipack.inc rpm2.inc share.inc vshare.inc That is, nt.par and radial.inc are not required, but share.inc and eik.par are. This kind of bookkeeping is much, much easier in CVS (as opposed to RCS) since CVS keeps track of what files are associated with a given project. ************************************************************************** 1996/09/16 8:00:00 ?? Dorland Reality condition enforced Kerbel pointed out that the reality condition was not properly enforced for our perturbation. I fixed this and also enforced the condition in bcon just to be sure. ************************************************************************** 1996/05/31 11:31:00 EDT Beer Fixed electron bugs & impurity bug Fixed several problems with the trapped electrons, and a bug in the impurities for charge.ne.1. I found these by benchmarking in the local limit (ld=1) with an independent code which solves the local dispersion relation. I've also implemented a new electron collision model which uses weaker collision coefficients in the higher moment equations. It does a little better than the previous simple model with constant coefficients, but it's not perfect yet. The tridiagonal loops in ecollis.f had to be broken up to deal with the non-constant coefficients. Changed files: ecollis.f new collision model flrinit.f fixed impurity charge bug init.f factor of 2 bug in defn of omegade, added s-alpha for electrons input.f fixed problem deriving electron closures, leading to new electron closure coefficients prstep.f Ti/Te bugs in electron fluxes, 3/2 bug in electron heat flux timestep.f Ti/Te bugs in electrons nt.par checked in, was missing in RCS ************************************************************************** 1996/04/22 24:30:00 EDT Dorland General geometry update I fixed some problems that were buried in the eikcoefs subroutine. The shift and triangularity options now work more reliably, for example. Also, in addition to EFIT output, this code can now use equilibria generated from VMOMS to specify the geometric quantities by choosing vmom_eq=.true. in the 'stuff' namelist. The vmoms data should be found in a file called 'test.psi' that is generated by the version of the VMOMS code available from Dorland. A possibly annoying feature of the changes introduced here is the profusion of include files. One must now have these files present: share.inc vshare.inc radial.inc (instead of radial.cmn) ipack.inc (instead of ipack.cmn) If you don't want to deal with all of this, you can comment out the single call to 'geometry' in init.f, use igeo=0, and change the Makefile to not use geometry.o in the itgc build. This version does not produce exactly the same results as the previous version if igeo=1, because some root finding routines were changed. However, the results should be very similar (for linear runs, especially). I would like to move all of this RCS structure into a common CVS structure that includes all of the gyrofluid/IFS/PPPL codes. The CVS structure is at PPPL in /afs/pppl.gov/u/qpliu/. The advantage to the move would be that only single copies of eikcoefs, fitpack, and other pieces of code that are being shared would be required, and the CVS modules can be used to manage things more efficiently than at present. ------------------------------------------------------------------ unrelated bug found: I think that the shr=0, nonlinear runs had a problem in Poisson's equation related to the treatment of the electron response. Changes: flrinit.f ************************************************************************* 1995/08/10 21:10 GMT Hammett Major coding clean up. Release Date. I made a lot of major changes to the coding, which are mostly cosmetic and do not affect the physics. The main new features are: (1) You can now have multiple ITG jobs running simultaneously on different input files in the same directory. (1a) Each run can now have its own post-processor namelist file. If *.pin (the name of this file) doesn't exist, then it copies over post.in to use as a default. Everything should be upwards compatible so that "xitgc runname" works normally. (2) The postprocessor has been upgraded to abandon the old cosine/sine notation, and uses complex variables throughout, with the same names as in the main code. postc.f now uses the same 'itg.cmn' common block as does the main code. (3) Moving towards more flexible i/o, so we don't get frozen with the format of the *.res file. This isn't the perfect i/o system, but we no longer have to maintain the compatibility of read and write statements in wpunch.f, wread.f, and postc.f. The section in postc.f which used to read the *.res file has been replaced with a call to wread.f, and wread.f is generated by running wpunch.f through a filter to replace all "write" statements with "read" statements. (3) Minor bugs fixed: wpunch.f and corresponding read routines have been made compatible (tau had been written twice). (4) The *.m plot file is backwards compatible, so you can verify that the new version of the code reproduces the results from an old version of the code by typing "diff oldrun.m newrun.m". To reproduce the date/time label at the bottom of old plots, you have to set debug_plotlabel variable in the newrun.in namelist file. (5) Minor bugs noted but not fixed. Details on the above: (1) You can now have ITG jobs running simultaneously on different input files in the same directory. The code no longer references generic files like itg.res which later get renamed to runname.res. Instead, the code reads and writes directly to files like runname.res. This should be transparent to most users. You still type "xitgc runname" to run the full code. This also makes debugging a little bit easier, as you don't have to manually set up all of the itg.* input files before running the code in the debugger. Instead, type "cdbx itgc", and then in the cdbx input window type "run RUNNAME" to pass RUNNAME to the code as a command line argument. Note that the X-windows version of cdbx has a bug and is not able to read the source files out of ../src. Either copy of the *.f source code you want to debug to you present directory, or use the line-oriented version of cdbx, by typing: "cdbx -L -I ../src itgc". (5) Minor bugs noted but not fixed. ** kx-spectra made by postc.f did not plot the highest kx point. That can be fixed by replacing ndpmaxerr with ndpmax. The bug was left in to allow exact reproduction of old plot files, so the code can be tested easily by typing "diff run1.m run2.m". It still has a bug in that it doesn't average over negative kx's. That should be fixed eventually. ** Mike made some small changes to postc.f for negative shear on 1995/06/13 which didn't make it into the general geometry version. I put them back in. ** beginning with the trapped electron version 4.0, the format of wpunch.f and of the reads in postc.f were not backwards compatible with older versions of the *.res results file. It would be nice to change it at some point so our current post-processro can go back and read results from older runs, particularly the large SPP run we did once... ************************************************************************** 1995/07/22 03:19:01 GMT Hammett minor bug fix Put the fave parameter back into post.in, otherwise get random time averaging. (This fixes a bug which existed only since 6/27/95). ************************************************************************** 1995/06/27 08:17:13 Greenwich(?) time Dorland General geometry revision Sorry this is so long, but it's better to be explicit than shy when I am so far away. Each of these changes is described in more detail below: (1) Added general geometry option, as described in Ch. 4 of Beer's thesis. (2) Added option to calculate the critical gradient (3) Added option to calculate diffusion necessary to stabilize given modes. (4) Changed main code into subroutine to allow easier incorporation into other codes. (5) Minor fixes/enhancements (6) Related things that still need to be done New files: eik.inc geometry.f dm_calc.f itgc.f (new main program) eik2.inc volflux.f critical.f eikcoefs.f volkflux.f tc_out.f Revised files: itg.cmn itg.in post.cmn post.in Makefile courant.f filter.f input.f pertb.f preenr.f volsq.f wread.f echoin.f flr.f itg.f poisson.f prstep.f volsqk.f ecollis.f flrinit.f itgc.f postc.f timeavp.f wframe.f init.f kparfft.f posten.f timestep.f wpunch.f Files no longer used (unless someone objects): prtime New output files created ************************ There are a few new output files that may appear in different running modes: eik.out A list of the non-normalized geometric coefficients eik2.out A more useful file containing geometric quantities. eik3.out The (-pi,pi) part of the theta grid used by eikcoefs dmix.crit Some namelists that may be useful in the future for on-the-fly calculations. itg.freq A short printout of the frequencies, growth rates, and D_ML values. -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* (1) General geometry (axisymmetric) ****Not yet implemented for non-adiabatic electrons**** Input variables and general background: --------------------------------------------------------------------- igeo=0 -> no change; should be the same as previous version of the code. igeo=1 -> use the 'geometry' subroutine to calculate the theta-dependent coefficients described in Ch. 4 of Beer's thesis. Also, include Jacobian factors where appropriate. The basic code (eikcoefs) that calculates the needed geometrical factors is being used in a few other codes as well: Kotschenreuther's initial value kinetic code (gstotal), his eigenvalue (on-the-fly) code (htotal), and Hammett's transport code (ptor). There were some minor tradeoffs made in gstotal, htotal, itgc, and in the ptor implementation to allow one routine to fit every code. The principal tradeoff with respect to itgc was that the theta grid that the coefficients are calculated on is not exactly the theta grid itgc uses; an interpolation is performed to translate between the two. (In fact, none of the fully developed larger codes uses exactly the grid that eikcoefs employs.) Eikcoefs was originally developed by Kotschenreuther (who was prodded by Hammett) and extensively revised and upgraded by Dorland who learned the general geometry from Beer's thesis, from Kotschenreuther, and from the references in Beer's thesis. There are several additional options available if igeo=1. They are found in the 'stuff' namelist, below the 'wdat' namelist in the input file itg.in. In general, it is safe to comment out variables; the default variable set corresponds to circular flux surfaces without making the high aspect ratio expansion. Here are the choices: Level of approximation: itor=0 -> assume circular, concentric flux surfaces with low beta, high aspect ratio expansion. This option is almost the same as igeo=0. The only differences are additional factors of the Jacobian in the volume integrals, and a small change in precision traceable to the inter- polation of the coefficients from the basic theta grid onto the theta (z) grid used in itgc. itor=1 -> Compute general axisymmetric coefficients. Axisymmetry was assumed in the sense that the nu function introduced in Eq. 4.9 of Beer's thesis was assumed to be independent of phi in the calculation of alpha. Symmetry: Up-down symmetry may be forced by choosing isym=1. Minor radius: The preferred method for specifying the minor radius is with the input variable 'rhoc'. However, unless iflux=1 (specifying the EFIT interface described below) 'eps' may still be used. Simply comment out rhoc or set it equal to -1 to use the value of r implied by eps. There are three choices available for the radial coordinate (rhoc): irho=0 -> rhoc=a/A, where a is the area of the desired flux surface, and A is the area of the last closed flux surface (LCFS). irho=1 -> rhoc=sqrt(Phi/Phi_a), where Phi is the toroidal flux enclosed by the desired flux surface, and Phi_a is the toroidal flux enclosed by the LCFS. While this definition is in common use in the community, it is a non-local definition, since it can depend on gradients far interior to the desired flux surface. Phi is calculated by integrating the q-profile, *** thus irho=1 is not allowed unless iflux=1, **** in which case the EFIT q-profile is used. [I compared the EFIT q-profile to the calculated profile, and they agree very well.] irho=2 -> rhoc=d/D, where d is the diameter of the desired flux surface, and D is the diameter of the LCFS. This is the default and recommended choice. It is a local definition (like a/A) but is very fast to calculate. Values of rhoc < 0.05 are difficult to calculate if iflux=1, and may cause the code to bomb. rhoc=0 is not allowed. Also, rhoc > 1 is not guaranteed to make sense. Note that the choice of irho (a) implies a particular definition of the scale lengths, e.g., 1/L_T = - d ln(T) /d ln(rho), (b) chooses the central radial position of the simulation, and (c) causes psi (Beer's choice for the radial coordinate in Eq. 4.9) to be replaced by one of these three radial variables. All normalizations are taken care of; in practice I find that there are typically only small differences among the coefficients generated by the different flux surface labels. (d) always yields a variable that can range from ~0 to 1. [Remember that q(95) is usually defined to be the value of q(psi=0.95*psi_LCFS) despite the fact that no major tokamak analysis codes use psi as the radial coordinate.] If itor=1 there are three options available to specify the geometry: iflux=1 -> Read in Psi, f(psi) as generated by EFIT. All of the following inputs are ignored if iflux=1: qsf, shr, akappa, akappri, tri, tripri, shift, eps The appropriate values are calculated from the EFIT equilibrium. eps is overridden by the choice of rhoc, which is more convenient in this case. A message to this effect is written to the screen to remind the user. One must also specify where the EFIT data is, using the 'efit_data' namelist at the bottom of the itg.in input file. EFIT output files are labeled with prefixes, the shot number, and the time. itgc requires only the file beginning with the prefix 'g0'. For example, a file for shot number 82205 at 2.4 seconds (included in the RCS library for reference) is called 'g082205.02400'. (I generated it myself using EFIT; it should not be considered a reliable calculation for this particular experiment, but it is a perfectly good test case for itgc.) 'big' is an integer expansion factor to improve the interpolation of the EFIT data onto the (r,theta) grid. iflux=0 -> Use Kotschenreuther's formulas (found in rpfun) to generate the appropriate coefficients for each of akappa Standard elongation, kappa akappri d kappa/d rho tri triangularity (broken at the moment) tripri d tri/d rho (not implemented yet) shift d Delta /d rho iflux=-1 -> Very simple equilibrium calculation generated with a separatrix and with elongation specified by akappa. airat should also be specified, but this option is NOT RECOMMENDED AT THE PRESENT TIME The numerical parameter delrho determines the denominator in expressions of the form d/drho and indirectly in expressions like d/dpsi. It should be ok at the default value. If it needs to be adjusted downward, the code warns you and then does it. The two remaining variables in the 'stuff' namelist are ntheta and nperiod. These values default to the appropriate values and can be safely ignored. You may find that high-resolution theta grids require the 'ntm' parameter in 'eik.inc' to be increased. If this occurs, the code will stop, and a message will be written to the screen telling you the appropriate value to use. Other changes in the code related to geometry: --------------------------------------------------------------------- Volume integrals are defined as Int(dV A) -> Int(dtheta dalpha drho J A) where J is the Jacobian. In earlier versions of the code, the Jacobian factor was not included in the volume integrals, despite a 1/B(theta) dependence that was nominally of higher order, but retained in some other places. The only significant difference in the results between igeo=0 and igeo=1, itor=0 should be related to this effect. That is, when igeo=1, the Jacobian factors are included in the volume integrals. ****I have ignored Jacobian factors in the correlation function integrals for the moment. This should be updated.**** A similar factor appears in the surface integrals, discussed below. In general, the quantities we are most interested in were originally divergences that have been flux-surface averaged. These quantities, such as the heat and particle fluxes, are now calculated in separated routines from ordinary volume averages. [Note that there is a minor misprint in the definition of the flux-surface average that appears in the Appendix of Beer's thesis. The factors of |grad psi| should not be present; the definition should read < Phi(psi) > = Int(dalpha dz J Phi) / Int(dalpha dz J).] [The following comments also appear in volflux.f:] For a vector flux F, the quantity that is needed for transport calculations is dV/drho* < F.grad(rho) >, where <...> signifies a flux surface average: < G > == Int(dtheta dalpha drho J G) / Int(J dtheta dalpha drho) / == Int(dtheta dalpha drho J G) / dV/drho * delta_rho / Taking delta_rho -> 0 is the definition of the flux surface average. Our simulation domain is small enough that averaging over the entire radial domain is ok. Comparing the denominators above, dV/drho = 2.*pi*Int(J dtheta) in our coordinates, since J is only a function of theta. volflux actually calculates < F.grad rho > / < |grad rho| > = Int(dtheta dalpha drho J |grad rho| F_rho) / Int(J dtheta dalpha drho) ----------------------------------------------------------------------- Int(dtheta dalpha drho J |grad rho| ) / Int(J dtheta dalpha drho) F.A delta(rho) = -------------- A delta(rho) so that, for example, the heat flux that is reported is the heat flux per unit area. This quantity is calculated assuming that F is an ExB type flux, and that a and b are the parts whose product is the generalized radial (rho direction) component of F, denoted F_rho. Note that < |grad rho| > * dV/drho is the area of the flux surface. dV/drho == 2.*pi*Int(dtheta J(theta)), where -N*pi <= theta <= N*pi. This modification for igeo=1 is useful because it reduces to the previous definition used for the volume-averaged flux in the circular flux surface limit. Also, normalization issues associated with our box actually extending multiples of 2*pi in the theta direction are cleanly handled with this choice, as follows: Our reported heat flux fits into the the fundamental transport equation d nT/dt + 1/V' d/drho V' < Q.grad rho> + ... = 0 like this: d nT/ dt + 1/V' d/drho A Q_itgc + ... = 0 where Q_itgc is the heat flux calculated with volflux, and A is the area of the flux surface. In this equation, V' should be the physical V' (that is, using (pi < theta < pi) in the theta integration). This transport equation can be rewritten (using definitions above): d nT/ dt + (< |grad rho| >/A) d/drho A Q_itgc + ... = 0. In this form, there is no ambiguity about which V' definition to use; one just calculates < |grad rho| > and the surface area, plugs in the heat flux calculated from itgc, and that's that. Note that there will be an additional factor of <|grad rho|> (typically stabilizing, I believe) that this definition of Q fails to capture. The GA people tend (as I understand it) to lump the factor of <|grad rho|> out front together with the factor in the Q definition, and thus talk about a <|grad rho|**2> effect on the anomalous transport coefficients. This approximation fails to capture all of the radial dependence of <|grad rho|>. Note that the definitions of the gradient scale lengths as 1/L_T == - d ln(T) / d ln(rho) become functions of the choice of rho, the flux-surface label. If one is calculating everything within this ptor-itgc-eik code, there is no complication arising from this observation. One may use any definition available. However, otherwise one must be careful to keep everything consistent. Factors of drho_1/drho_2 appear, that are not explicitly reported in general at the present time. Mike K. and I suggest that since we are performing local simulations (in radius), the default definition of rho that we choose should be a locally defined quantity. Furthermore, it should be very fast to calculate (to keep the time spent in the eikcoefs subroutine to a minimum). The definition that best seems to fit these two criteria is rho_d = d/D where d == the diameter of the flux surface, and D == the diameter of the last closed flux surface. Any interpolation formulas or databases, etc., that we develop will probably use this definition of rho unless someone thinks of a good reason to go with another choice. Definition of D_ML ****************** gamma/ is generalized to gamma/<|grad S|**2>, where S is the eikonal from ballooning mode theory. Since k_x**2 is not defined for igeo=1, this output variable was not updated in postc. Neoclassical damping of poloidal flows in general geometry: *********************************************************** rkx2, rky2 are generally replaced by rkperp2. rkx is not explicitly calculated if igeo=1; this will cause a problem for studies of neoclassical damping of poloidal flows in general geometry, because the initialization in this case uses the variable rkx. Also, I have not yet updated the definitions of the poloidal and toroidal flows in prstep if igeo=1. bmag -> b_mag ************* The magnitude of the magnetic field variable was renamed to avoid a name collision. It is now b_mag. If itor=1, it includes the magnitude of the poloidal field as well as the toroidal field. ---------------------------------------------------------------------------- (2) Option to calculate the critical gradient ****Not yet implemented for non-adiabatic electrons**** This is a simple option chosen by the input variable icrit=1. The algorithm used is to assume that, for a given k, the growth rate is linear in the parameter Delta == (R/LT - R/LTcrit). Then, R/LT is adjusted until gamma=gamma_0 and again until gamma=gamma_0/2. The critical gradient is estimated by linearly extrapolating from these two values of R/LT. Right now, the algorithm assumes that all of the density gradients and temperature gradients for the ion species are the same for simplicity. ***The algorithm works best if you start with a large value of etai.*** The code runs until all R/LTcrit for each of the selected modes has been calculated, or until ncycle=nstp. Modes with different k_theta's are advanced with different time steps, to make the calculation more efficient. Modes with k_theta=0 are ignored. Notice that the smaller k_theta's may get pretty large time steps, so that you may not be able to choose very large values of dt0. So far, the biggest problem that I have noticed with the toroidal gyrofluid models in itgc is that the critical gradient differs from the kinetically calculated critical gradient by as much as 40%. This problem seems to be particularly acute for small values of k_theta, or when omega ~ 1/qR. Itgc will probably not work well as an on-the-fly code until we can straighten this problem out. ---------------------------------------------------------------------------- (3) Option to calculate the diffusion required to stabilize the mode ****Not yet implemented for non-adiabatic electrons**** Chosen by icrit=2. The algorithm is again simple. The diffusion D_mn for each mode is adjusted independently until (gamma/)_mn = rdiff_0. Then, D_mn + (gamma/)_mn is reported as the diffusion required to stabilize the mode. This algorithm is the fastest, most robust way I know to get linear estimates of D_ML. I have played a lot of games, like keeping the diffusion only in certain equations, varying the relative values of D_mn used in each equation, etc. The results are pretty insensitive to all of these tricks. Another nice thing about this calculation is that the requirements for the maximum x0 are very mild. Again, modes with different k_theta's are advanced with different time steps. Modes with k_theta=0 are ignored. Including damped modes slows down the calculation, because they take a long to time to settle down. Notice that the smaller k_theta's may get pretty large time steps, so that you may not be able to choose very large values of dt0. ---------------------------------------------------------------------------- (4) Made itg a subroutine. itgc now contains the main program block. icontrol is a small array of switches for controlling itgc's actions in other modes besides the one we typically choose (nonlinear simulations). For example, it is useful to be able to call itgc to calculate D_ML on the fly. ---------------------------------------------------------------------------- (5) Minor fixes and enhancements: I included an option to specify x0 = N pi described in itg.in. It's easier than typing in multiples of pi all the time. The timestep plot was messed up because of an indexing error in postc. I commented out the call to prtime, which appeared to be really ancient. In post.in, if igraph=0, no graphic files are generated. ---------------------------------------------------------------------------- (6) Related things that still need to be done: --Improvements to the model equilibrium (iflux=0) will be the first order of business. --The burst page in the *.m file does not reflect any of the changes listed here. --Some plots of the actual flux surface would be nice. I've done this with supermongo, but I haven't transferred the know-how to ncar yet. --No changes were made to the electron sections, except in volsqk and volkflux. In particular, electron variables should be included in the filter subroutine if icrit=2, since the diffusion is meant to mimic the effect of the nonlinear terms. I don't know what to do about the pitch angle scattering terms under this option. --Also, if icrit=1, more flexibility should be allowed with respect to which driving gradients are adjusted. --The correlation function calculations should be checked for Jacobian factors. --The toroidal and poloidal flow calculations need to be updated. --Restarts with icrit.ne.0 have not been made possible. ********************************************************************** 1995/05/24 15:01 EDT Beer Trapped electron revision Added trapped electrons, as described in Ch. 3 of my thesis. With epse=0.0, the electrons are adiabatic and the code should reproduce results from the previous (adiabatic) version of the code. The new variables are described in itg.cmn; the most important ones are the electron moments which are functions of kx, ky, and pitch angle (kappa), instead of the along-the-field coordinate. There are new routines to calculate the bounce averaged potential and perform the velocity space (kappa) averages (baphi.f and kapav.f), and the weights used to perform these integrations are pre-calculated in init.f. The pitch angle scattering electron collision terms are calculated in ecollis.f and tridagc.f, unless the electron collisionality is zero (nueeff=0.0). The new nonlinear electron routines (nonline, nlpsce, kxspacece, kyspacee, yspacee, xspacece) are just the ion nonlinear routines with theta replaced by kappa. The other new routines are for volume averaging (volsqk) and plotting (hmplotk) the electron moments. For the moment, poisson.f uses at the previous timestep instead of calculating self-consistently at the current time. I'll try to improve this in the future. New diagnostics are plotted by postc.f (electron heat flux, particle flux, electron harmonics in pitch angle space, etc.) and most of the other changes are to calculate and pass these new arrays to postc. With trapped electrons, I've only benched with the fully connected BCs, so use iperiod=3. Finally, I changed the theta_0 spacing for linear runs so the theta_0's are evenly spaced instead of having kx's evenly spaced. This makes linear runs with connected FFT's more efficient. Changed files : Makefile connect.f echoin.f flrinit.f init.f input.f itg.cmn itg.f itg.in itg.mom itg.par pertb.f poisson.f post.cmn postc.f prstep.f sundum.f timestep.f wpunch.f wread.f New files: baphi.f ecollis.f hmplotk.f kapav.f kxspacece.f kyspacee.f nlpsce.f nonline.f pcons.f tridagc.f volsqk.f xspacece.f yspacee.f ********************************************************************** 1995/03/28 01:45 PST Beer Toroidal FLR, new closure coeffs, connected FFTs, energy cons, and new plots Updated RCS version to use the toroidal FLR and new closure coeffs as described in Ch. 2 of my thesis. The current model is iflr=8; the option to use the non-O(b) accurate FLR terms in Ch. 2 [Eqs.(2.33- 2.36)] is included (iflr=9) but not recommended. The old models are retained for backwards compatibility, but remember to put in the old closure coeffs if you want to exactly match an old run. The new toroidal closure coeffs are now hardwired in input.f, so you don't have to worry about changing itg.in when using 3+1 vs. 4+2. When putting the toroidal FLR terms in, I fixed a bug in the omega_d Phi terms. I believe these were off by a factor of T_i/T_I, which is usually only significant for beams. The flux surface averages in poisson.f now use dl/B. Pertb.f updated for nice flow initializations (used in Sec. 5.3.2 of my thesis) or bumpy equilibria if iphi00=9. Changed the omegad definition by a factor of -2 to make it consitent with Eq.(2.10), but this is only internal to the code: the inputs don't change. Also, as described in Ch. 2, the B|k|(q/B) terms are now |k|q, which removes the spurious instability in the bumpy torus limit. Added option (iperiod=3) to use "connected FFTs" as described in Sec. 4.7, by stringing all modes connected by the parallel BC onto one long parallel domain. Two new parameters added to itg.par: ndomz, the max number of different FFT lengths, and nconz, the max number of modes which can be connected together. Both must be set to one (nconz=ndomz=1) if using iperiod=0, 1, or 2. If using iperiod=3, the code will tell you how large they have to be, but then it has to be remade with the new itg.par. connect.f is a black box that calculates which modes get connected. Added 3+1 energy conservation diagnostics as described in Sec. 5.2. This is only exact for eps=0. Added option to do time averages over the last "fave" fraction of the run, i.e. if fave=0.75 in post.in, time averages will be over the last 75% of the run. Added new correlation function plots and flux surface averages of potential, upar, and pressure. Also added k-space contour plots of various quantities. Total affected files: itgc: postc: other: bcon.f cnplot0.f Makefile connect.f cnplotk.f post.in flr.f hmplot.f flrinit.f nsumavp.f init.f timeavp.f input.f itg.cmn itg.in itg.par kparfft.f poisson.f pertb.f posten.f preenr.f prstep.f timestep.f wpunch.f 1994/10/24 11:45 EDT Hammett & Beer Fixed Small Bug in timestep.f Fixed a bug in the u_par2*bgrad term in the T_perp equation. This only affects simulations with finite eps (i.e., including the magnetic mirroring forces) which usually doesn't have a big effect. This bug has been around since version 3.0. Mike Beer found and fixed it in his fully-connected version of the code sometime in the summer of 1994. Mike has also changed other toroidal terms in his versions of the code. These changes are all FLR corrections to toroidal terms only, and might still be subject to change as various models are tried. They will eventually be checked in to the RCS library when Mike has finally settled on an acceptable model. 1994/08/29 05:00 EDT Hammett & Beer Fixed Volume-averaged normalizations. Fixed normalization in volsq.f. The parallel step size for the volume integral had been too small. Old results using the "equal-extension" method should be multiplied by x0/xp to get the right answer for all volume-averaged quantities (like the heat flux, spectra, etc...). Also added windowing to the parallel FFTs (by changing bcon.f) to eliminate high-k errors due to transforming a non-periodic function... Also changed bcon.f to properly handle the special case of a single n mode (nd=1 runs, which might be used for linear runs) which has no other mode to connect to. 1994/03/31 20:40:19 Dorland Changed Ln normalizations and a few minor things. It is much easier to normalize most quantities to the electron density scale length rather than the otherwise arbitrarily chosen first ion species' density scale length. This has been done, and makes no difference except in multi-species runs. Note that eps_n is now defined to be eps_n == L_ne / R and the scale factor for the non-dimensionalization of the equations is now rho_i / L_ne where rho_i is the gyroradius of the first ion species. Time is now normalized to v_t / L_ne and k_par is also normalized to L_ne. Finally, the input variables Ln(s) = L_n(s) / L_ne. Other minor changes I made are (1) fixed some array dimensions to accomodate restarts in the correlation diagnostics, (2) removed overall offset in n=0 component of Phi in the initialization, (3) moved Courant condition check out of nlpsc and into a subroutine (to facilitate later multitasking optimizations over calls to nlpsc), (4) changed alog(x) -> alog(x+1.e-15) to avoid crashes in postc, and (5) rewrote multispecies parts of itg.in to reflect the changes and to improve the clarity. The total affected files were: timestep.f postc.f pertb.f nlpsc.f input.f itg.f courant.f itg.in Makefile 1994/02/27 05:29:49 Dorland Fixed restart bug and improved multispecies diagnostics. There was a bug in the restart sequence from the recent revisions of the code that is now fixed. init.f was the affected routine. I also upgraded the thermal flux diagnostic to handle multi-species. The thermal flux plot from postc has a total flux (in units corresponding to the first ion species) and the flux from each ion species. Corresponding minor changes to itg.f and prstep.f were included. 1994/02/21 20:19:09 Dorland Postc upgrad to multi-species begun. itg.in upgraded to multi-species. I changed some array structures and subroutines in post to make it run with multiple species. The diagnostics themselves still do not reflect the extra species where relevant. The linear diagnostics are okay, though, since they depend on the Phi arrays. I also improved the input format for the additional species. All input is now in itg.in as before. 1994/02/18 03:36:35 Dorland Optimization of multi-species code. In each of timestep.f, filter,f and flrcow.f I changed the outer loop indices to be nd. The compiler selects this outer loop for multitasking, so this is more efficient as long as nspecies << nd. The changes in poisson.f are more substantial. The only way that the densities and temperatures enter the quasineutrality constraint is under the species sum, so I switched the loops around to take advantage of this. Also, I simplified the iphi00=2 loop for nonlinear runs just to make sure that the compiler was getting it right. 1994/02/17 5:55 GMT Dorland Multi-species capabilities begun. -------------------------------------------------------------------------- Caveats: I haven't benchmarked the multi-species part of the code, but I am presently doing so. I have compared nonlinear 4+2 models with a single species and the differences are strictly round-off, arising from a slight change in poisson. The stupid compiler caused me no end of trouble, as I had to add a couple more special protected regions. One could improve the efficiency and the compiler performance with a little more thought. The notes below outline what has been done and what remains to be done. No changes have been made to postc at all. -------------------------------------------------------------------------- Multiple species (max nspecz, set in itg.par) can be handled under the assumption that they are Maxwellian to zeroth order. The new variables that control the non-primary s^th species are normalized to the primary ion species (i) values as follows: tau(s) ....... T_s / T_i rmass(s)....... mass_s / mass_i charge(s)...... Z_s / Z_i = Z_s (Primary species assumed to be hydrogenic.) The relative concentration is specified with respect to the electron density: n_I(s) ....... n_0s / n_0e The driving gradients are expressed in terms of the ratios eta(s) ....... L_{ns} / L_{T_perp s} eta_par(s)..... L_{ns} / L_{T_par s} Ln(s) ....... L_{ns} / L_{ni} ****<---- See comments of 3/31/94**** Two quantities are derived from the inputs and used often: vt(s) ....... v_{ts} / v_{ti} rho(s) ....... rho_s / rho_i Normalizing each species to its *own* density, v_{ts}, etc., (unlike in my thesis, where there is an error in Appendix E on this point) one finds that the following operators transform with the scaling factors: grad_par -> grad_par * vt(i) omega_* -> omega_* * Ln(i) omega_d -> omega_d * rho(i) * vt(i) accel. -> accel. * charge(i) / tau(i) b.grad B -> b.grad B * vt(i) nu_ii -> nu_ii (I didn't update this dependence.) b.grad b. v_E -> ?? I don't understand this term. Where does it appear? I did not yet work out the proper nu_ii dependences, and I don't understand the last operator (and hence didn't do anything to it). Also, someone should check my finding that b.grad B -> b.grad B * vt(i) since I am not as familiar with the toroidal geometry terms as you two are. Electrons are presently assumed to be strictly adiabatic, although I suppose one could include an electron species in the above structure for a stiff price (since the FLR terms are all calculated). The old i-delta model could be re-inserted easily, too. I haven't decided on the best way to input all of this data yet, especially within our namelist format, so right now one must hardwire the various values in the subroutine init.f (near the end). Furthermore, no check for self-consistency is performed, i.e., it is not required that sum{Z_s n_s} = n_0. s This should be included in the future. Files that were affected: Makefile bcon.f filter.f flr.f flrcow.f flrinit.f (new region of cfpp$ skip R needed here!) history init.f (was cnstnts.f but I hate that name.) itg.cmn itg.in itg.f kparfft.f kxspacec.f nlpsc.f nonlin.f pertb.f poisson.f (new region of cfpp$ skip R needed here!) preenr.f prstep.f timestep.f (nl_qperp3 included now + species changes) volsq.f wframe.f wpunch.f wread.f xspacec.f I did not attempt to optimize the changes, and since the outer loops have generally changed from the nd loops to nspecies loops and nspecies << nd, we need to do something more to get as efficient multitasking as before. Finally, I uncommented nl_qperp3 in timestep.f. This term has been calculated all along but not added in in timestep.f. ************************************************************ To make detailed comparisons with previous nonlinear 4+2 runs, comment this term out again or the answer will change slightly. ************************************************************ 1994/02/15 16:35 GMT Beer Correlation functions in extensions Added section to prstep.f to completely incorporate the BCs in the extensions. Now Phi(z) along the field line is periodic in the extensions. 1994/02/14 18:29 GMT Dorland Poisson bracket definition Changed sign of nonlinear terms calculated in nlpsc.f and also in timestep.f to be strictly correct. This changes nothing in the actual equations (the sign changes cancel), but makes nlpsc now calculate the true Poisson bracket instead of the its negative and improves the readability of timestep.f. 1994/02/14 1:00 GMT Hammett CM-5 compatibility Cleaned up antiquated left-over code from prtime.f for CM-5 compatibility. 1994/02/13 21:32 GMT Beer Correlation functions Added correlation function diagnotics, and a new input parameter, n0, the fraction of the toroidal direction covered by the box. For iperiod=2, n0 changes the phase of the parallel BC's--if n_0 q_0 is an integer, it has no effect. It doesn't seem to have any noticeable effect nonlinearly, even if n_0 q_0 is non-integer. To make contact with previous results, just be sure to make n_0 q_0 an integer. Also moved all output from postc into *.post (I had been creating other files to make nicer plots with mongo), expanded the namelist in echoin.f, and updated the default closure coefficients in input.f. ************************************************************************* 1994/02/09 07:35 GMT Hammett CM compatibility changes. Mostly changes to move towards portability to the CM-5 Connection Machine at Los Alamos. The two main portability changes were shorter lines (to avoid a 72 character limitation on the CM-5) and eliminating pseudo-3D indexing of 1D arrays. For example, in flrcow.f, I replaced do n=1,nd do m=1,md do l=1,ld-1 index=m+(n-1)*md+(l-1)*md*nd b(l,m,n,1)=a(l,m,n,1)*wgt(index) with do n=1,nd do m=1,md do l=1,ld-1 b(l,m,n,1)=a(l,m,n,1)*wgt(l,m,n) This is just as fast on the CRAY, and the CMAX convertor is able to recognize this form to convert it to a Fortran-90 Array operation for the Connection Machine. (CMAX is unable to convert the first form to a Fortran-90 array operation. It instead runs as a very slow do-loop on a single processor...) Similar indexing changes were made in flrinit, flrcow, poisson, and filter, and the relevant arrays were dimensionsed as 3D in itg.cmn. At the same time, I took out the Log RCS keyword from a few files, and also deleted the long comments this automatically generates of the history of edits to the files. I did this to remove ancient comments which weren't relevant any more (just as Bill deleted the long ancient history section from the beginning of itg.f...) One can always look at these RCS log entries by typing "rlog subname.f" if one wants to. (Actually, some of the automatic comments I removed came from before 1992/06/02, from previous pre-toroidal versions of the files in previous RCS libraries maintained by Bill. If desired , those comments could be seen by looking at version 3.0 or before of the code, with the RCS command "co -r3.0 subname.f".) I did compare this new version of the code with the older version on a couple of standard test cases. The results were identical to the last bit (no roundoff errors). ************************************************************************* 1994/02/05 03:08 GMT Hammett Optimization improvements to yspace, kyspace, xspacec, kxspacec, nlpsc, and mcfft. Reordered some of the nested loops to put the best loop for vectorization on the inside. But even then, for some reason the CRAY compiler sometimes picks the wrong loop to vectorize out of a set of nested loops, and it is necessary to add a compiler directive to help it select the best one. These changes improved the performance of some routines by a factor of 2-4, though the speedup of the overall code is not very dramatic. With these changes, the complex version of the code (version 3.0+) is just as fast as the last cosine/sine version of the code (version 2.0+), which are both faster than the earlier versions of the code (before 1994/02/01) because of improved dealiasing and real-to-complex FFT's. Also changed some of the notation to be a little clearer. Modified compare.f to gracefully handle wrong runname. Modified xitgc to gracefully handle wrong runname. ************************************************************************* 1994/02/03 dorland RCS checkin of conversion of itgc to complex notation along with a few minor changes to the common blocks (removing very old unused variables) and to the output files. All of the concurrent changes (improved dealiasing, real-to-complex FFT's, optimization, etc.) by Greg have been incorporated in this version. This version should be revision 3.0 in the RCS library in Mike's afs/RCS area. Important note: the C-90 (parallel) version of the code is the default with this revision, so you should use the Makefile 3.0 or higher or confusion can result. That is, itgp.cmn -> itg.cmn and so on. I have not translated the NAG FFT calls to complex notation. Thus, for example, kparfft.f 2.0 used NAG FFT's and real notation; kparfft.f 3.0 calls Cray FFT's and uses complex notation. kparfftp.f 2.0 uses Cray FFT's and real notation. kparfftp.f 3.0 == kparfftp.f 2.0 since no changes were made in that routine. Almost all of the .f files belonging to itgc have been changed. This initial changeover to complex notation preserves the results of the previous version of the code (to round-off) in a pair of 3D nonlinear test cases that use the 3+1 and 4+2 models. The only differences introduced in the results come from treating the parallel pressure as n + T_par = p_par in timestep.f. Previously, the parallel pressure was calculated in the main block of code and passed to timestep; this was unnecessary and is no longer done. * A minor bug was noted but not fixed: in the q_perp equation, there should be a nonlinear term \hat{\hat{grad}}_perp^2 v_\psi dot \grad q_perp This constitutes an nonlinear FLR correction to the q_perp equation and should not be very important. It was previously calculated but not included in timestep.f (wdqc,wdqs), so the timing of the code in the 4+2 model should not change as a result of including this nonlinear term. It can be included simply by uncommenting the nl_qperp3 term in timestep. However, with this correction, a better solution is to note that the operator \hat{\hat{grad}}_perp^2 always occurs in the form 1 + \hat{\hat{grad}}_perp^2 and to change flrwgt2 accordingly. This reduces by two the number of nonlinear terms calculated in the 4+2 model, and has been implemented in the slab version of this code. * Many of the variables in the code have been renamed. Here is a translation table: (wc,-ws) -> density (vc,-vs) -> u_par (tparc,-tpars) -> t_par (qparc,-qpars) -> q_par (tc,-ts) -> t_perp (qc,-qs) -> q_perp (uc,-us) -> potential (wco,-wso) -> old_n (vco,-vso) -> old_upar (tparco,-tparso)-> old_tpar ... (uco,-uso) -> old_pot * The complex number i was added to the parameter list in itg.par as the variable 'ii'. * The version of the code suitable for the C-90 is now the default version. Thus, I have taken itgp.cmn -> itg.cmn, *p.f -> *.f, etc. and changed Makefile accordingly. I did not update nlpmp.f as it has not been updated for the toroidal code and is not presently used. * The subroutine real_comp was added to translate between the old notation and the new. The itg.res file is presently written in the old format for easy comparison with the past and because postc.f has not been upgraded to complex notation. * Some vestigial variables were removed from itgc and postc: rmu2, rkap, tkap, qkap, qkappar, etaik, etaik1, eb, fb, vdely, vdelz, dtmax, imp1, imp2, imkap None of these variables was used in the present slab or toroidal itg codes (except for vdely and vdelz, which were not used in the toroidal code and need to be thought about before including them in the toroidal coordinates). * The variables dtmin, vy and vz do not do anything but are kept to allow old input files to work; they didn't do anything then and they don't do anything now. * Two output files from itg were eliminated because they were not used: itg.tim, itg.ana and the rest are opened and closed only where needed. * Greg's compare code was updated to accomodate the new itg.cmn variables. * The Fourier transforms in the nonlinear sections are done in place to save a little space. * Many references to electron arrays were commented out since a toroidal derivation needs to be completed before we use this option. * No changes were made to the *rw.f files. They should not be used without being changed over first! * The dash array in postc had a bug for nd>10 and was reconfigured. * The file y10res is not opened since we weren't using it. * The subroutines copy and spread were not being used by itgc and were removed from the Makefile. * Since we are using the new compiler I changed the Makefile options for compare so that the FFLAGS variable applies to it, etc. (I missed a bug for a while because I didn't notice that the FFLAGS option wasn't being used for compare.f). ************************************************************************* 1994/02/03 20:50 hammett Final? cosine/sine release date Version 2.0 of all the source code, on the date above, is now the final cosine/sine version of the code. I did this by typing: co -l RCS/* ci -f2.0 * The one additional change from below is that Makefile uses the default compiler rather than the old one. ************************************************************************* 1994/02/03 20:05 hammett Final? cosine/sine release date This is a fairly stable release of the cosine/sine version of the code. I will try naming this release real940203, with the command: rcs -nreal940203: RCS/* Bill is about to checkin the modifications to upgrade from cosine/sine to complex notation, so this may be the final cosine/sine version. The changes in the present release are for OPTIMIZATION and a COMPILER BUG WORKAROUND: OPTIMIZATION: Rewrote a few loops in yspacep.f, kyspacep.f, xspacecp.f, kxspacecp.f, and nlpscp.f, to get better efficiency. I improved the speed of some of these routines by a factor of 2-4, and the speed of the overall code by 25% (for a test case with negligible i/o). I used the profiling tools ("hot spot analyzer") prof and profview to see where some optimizaiton might help. Some of the main tricks I learned can be seen by looking at one particular loop in yspacecp.f. It originally looked like this: do m=2,md cfpp$ select (vector) do imode=1,nalias*ldb n=(imode-1)/ldb+1 l=imode-(n-1)*ldb index=l+(m-1)*ldb kxa4(m,imode)=kxa1(n,index)/2. enddo enddo and I changed it to this: do n=1,nalias do m=2,md cfpp$ select (vector) do l=1,ldb imode=l+(n-1)*ldb index=l+(m-1)*ldb kxa4(m,imode)=kxa1(n,index)/2. enddo enddo enddo The C-90 is efficient at optimizing linear indexing functions (which can be linear functions of do-loop variables), which imode and index are in the second version. But it is apparently significantly slowed down by the "nonlinear" calculation of n (because it relies on integer division truncation conventions) in the first version. Usually, only the innermost loop is vectorized, and ldb is usually large enough (>= 64) for efficient vectorization. For some reason though, fpp tries to change the choice of loop to vectorize, so we force it with the cfpp$ directive. When compiled with -Zv, the next outer loop is sometimes unrolled. For large runs, md is usually big enough (>6-8) for this to be maximally efficient. Finally, when compiled with -Zp, the outermost loop is usually parallelized. nalias is usually larger than the number of processors, so this will also be most efficient. (It appears that automatic loop unrolling gets turned off by the -Zp cf77 switch. Some improvement might be achieved by turning it back on by hand, or by compiling with -Zv unless you are sure you are going to run parallel?) COMPILER BUG WORKAROUND: I've put a compiler directive into cnstnts.f to turn off vectorization/auto-tasking for the loop which the current version of cf77/fpp can't handle (the bug Mike had found). This enables us to use the current version of cf77 if we want. I verified that I get the same answers if I compile with the new cf77 with or without optimization (-Zp) and the old cf77 with or without optimization. I have not put the new cnstnts.f into rcs because Bill is in the middle of a massive conversion to complex, but if you want to see it, it is in ~hammett/afs/itgc/cnstnts.f. I've also verified that the plots made postc (and the *.post file, to within roundoff) agree if post is run with or without optimization with the new compiler. It is probably good to occasionally compare runs with and without optimization (and compare with old runs with previous versions of the compiler) to try to avoid compiler bugs... ************************************************************************* 1994/02/02 1:20 GMT hammett Modified itg.f to initialize qparco, qco, etc., to avoid a spurious "uninitialized variable" error in timestep.f when using a 3+1 model which doesn't need these variables. There was no actual bug in previous results. ************************************************************************* 1994/02/01 19:35 GMT hammett Upgrades to do more efficient dealiasing and to do more efficient real-complex FFTs. We originally did the dealiasing by doubling the # of k modes, while it is only neccessary to add 50% more modes. This gives a factor of (2/(3/2))**2 = 1.8 speedup. The parallelized FFT's (xmcfft) originally did only complex-to-complex FFT's. By making use of faster complex-to-real FFT's on the (x,ky,z)->(x,y,z) step (yspace.f), and real-to-complex FFT's on the (x,y,z)->(x,ky,z) step (kyspace.f), we were able to speed up the code by another factor of 1.4. The total speedup in the code from these two upgrades is a factor of 2.5. Old namelist files will still run, though they will be inefficient. To take advantage of the more efficient dealiasing it is neccessary to add the variables nffty and nfftx (documented in itg.in) to the namelist file, and then comment out the old values of md and nd. The actual non-zero k modes evolved span from -kmax to kmax, where kmax=(nfft-1)/3. For example, setting nffty=32 and nfftx=32 means that kx=-10 to +10, and ky=-10 to +10 are kept. (Actually, ky<0 modes are determined by the reality condition.) At the same time, I made a few minor changes to formats, or comments. Also made the CFL time-step adjuster be a little wiser (a modification Bill had already implemented in the slab code). I did verify that the new faster FFT methods agree with the older methods to within round-off error. One subtlety is that the CFL condition used to adjust the time step depends subtlety on the how many extra k modes are kept to do the dealiasing. The CFL condition requires finding the max(velocity) on the real-space grid. As one keeps more zeroed k-modes (less efficient dealiasing), one gets more points in real space and so resolves peaks a bit better. This causes the time steps to be different, and the resulting fields can appear to be very different, but that is because they are at different times. By setting dtadj=1.e20 or idt=0 in the namelist, one can force the timestep to be constant. Then the results will agree to 13 decimal places... ************************************************************************* 1994/01/29 06:47 GMT hammett Changed wpunch.f, wread.f, and postc.f to maintain backwards compatibility with older *.res files for now. Eventually, we need to make the *.res file self-documenting, so that a new postc can always figure out how to read whatever data exists in an old *.res file. We could do this with netCDF, or HDF, or something of our own construction... The res file should also be written unformmatted to keep from slowing down big production runs (formatted i/o is 250 times slower than formatted i/o on the CRAY, according to the optimization doc.) Yet I don't want to get too complicated... ************************************************************************* 1994/01/29 04:43:03 GMT hammett RCS checkin of a bunch of changes Bill had made: Mike and Greg, I made some changes to the toroidal code. The details are given below. The main improvements are that I found two bugs (one in code used only for comparisons with Ron and one minor array bounds error) and the new itgc code is 10% smaller with (lz,mz,nz)=(65,17,33). The affected files are: Makefile cnstnts.f flr.f flrinit.f (already given to Mike) itg.cmn (I actually changed only itgp.cmn.) itg.f itg.mom (new include file; will be useful later) itgp.cmn nlpsc.f (just changed the calling arguments) nlpscp.f (just changed the calling arguments) nonlin.f (new subroutine to calculate nonlinear terms) poisson.f stochf stochasticity variable for adiabatic electrons post.cmn postc.f preenr.f prstep.f timestep.f wpunch.f wread.f input.f Would somebody please check these changes into RCS for me? I don't have the requisite privileges. Here is a description of the changes that I made: (1) moved nonlinear terms into a subroutine called nonlin. --> This revealed an apparent bug in the nonlinear terms meant to --> model Ron's equations (unless he was intentionally not second- --> order accurate). (2) removed the variables k***c, k***s, ak***c, ak***s from the common blocks since perpendicular derivatives are easy to calculate in these coordinates. These variables appear in the routine nlpsc only (if at all). (nlpscp is changed too). (3) moved the u*c, u*s variables into global common, simplifying the calls to flr, preenr, prstep and timestep. (4) Commented out rp and sp variables since we won't be using them any time soon. They occurred in itg, wpunch, wread, postc and timestep. (5) Got rid of a lot of statement numbers from the orginal H&H code to improve the visual formatting. (6) Added my diagnostic (flux vs. |phi|) to postc. (7) rkx was dimensioned to be lz-1 in the first index but was written as high as ld in cnstnts. When ld=lz this was a bug. (8) I reordered the fields in wpunch, wread and postc to be more logical. -------------------------------------------------------------------------- I benchmarked the new code and the results are exactly the same as the old code (modulo the bugs). There is a new variable "stochf" that controls the electron term in a trivial way. The default value of stochf=0. leaves the code unchanged. Oh, I also changed the frequency with which timesteps are written to the screen. Just a matter of taste that can easily be ignored. On the note that Greg just sent out: I pointed out the factor of 1/sqrt(malias) just before I left Princeton and Mike seemed to know about it. The multiplier in the slab code has thus been 0.1 for a few months, as you expected. I should have mentioned it to you, too, Greg, but I forgot in the mad rush at the end. Of course, the z direction in the slab code didn't enter into the Courant calculation, so it was only the malias dependence that I knew about. Bill **************************************************************************** 1994/01/27 11:00:00 GMT hammett "First afs Release date" This date can be used to obtain a version of the code which had been fairly stable for a while, and had been known to work fairly well, by typing: co -q -M -d"1994/01/27 11:00:00 GMT" RCS/* The only changes I made by this date since 1994/01/17 were a few changes to Makefile, and a few improvements to comments in itg.in. ***************************************** 1994/01/25 mbeer Created an RCS/RCSOLD subdirectory to hold antiquated files which are no longer needed. (This allows one to do a "grep -i variable *.f" command to see how a variable is used in current code, without having to wade through how it was used in old code as well...) ***************************************** 1994/01/17 hammett We moved Mike's RCS directory for itgc from the theory sun cluster to afs, to allow easier interaction the the NERSC crays, and easier sharing of files between the 3 of us. This was the standard version which Mike had been using for his toroidal runs, including the parallel version using xmcfft for SPP runs.