$Id: README,v 1.7 2003/12/18 21:25:49 pletzer Exp $

GACO
====

Title:

GACO=GAbor COllocation: Solve ODEs and PDEs by expanding the solution in Gabor wave packets.

Keywords:

Gabor wave packets, collocation, differential equation.

Language:

Fortran90 and C


1. Abstract
2. Building instructions
3. Test suite
4. Code organization
5. Documentation
6. Anatomy of a driver
7. Assembly step
8. Loose ends
9. References

1. Abstract
-----------

GACO=GAbor COllocation, is a package written in Fortran95 to solve ODEs and PDEs based on expanding the solution in Gabor wave packets. 

Gaco uses the collocation method, the test functions upon which the equation is projected onto are Dirac delta functions. This makes numerical integration trivial, it also eliminates inaccuracies associated with numerical quadrature. 

A Gabor wave packet is a sinusoidal exp(i *k*x) wave with a Gaussian envelope exp(-sigma*(x-z)^2/2):

g(k,z; x) = exp(I *k*x) * exp(-sigma*(x-z)^2/2) (I is the imaginary sqrt(-1) number)

The expansion takes place both in frequency space, k = j*delta_k with j=-Nj..+Nj (like FFT) and in configuration (real) space. The expansion in configuration space comes from shifting the envelope position, z = i*delta_x i=0..Nz, (similar to the finite element method) across the domain. The Gabor wave packet expansion differs, however, from FFT in that the solution need not satisfy periodic boundary conditions. The Gabor wave packets also have the advantage to be smooth to all order (contrary to finite elements). Therefore, GACO can be used to solve ODEs/PDEs of, in principle, arbitrary order (a limit of 10 presently applies).

The main drawback of using Gabors is that the stiffness matrix is less sparse than that obtained using finite elements. (Formally the stiffness matrix is full, however, finite machine accuracy renders the matrix sparse.) The sparsity pattern depends the overlap between Gabors. Because the Gabors decay faster than exponentially away from z, overlapping contributions can often be neglected. For example, the overlap of two Gabors that are 6-7 times 1/sqrt(sigma) distance away only contribute to < 1e-10.

The main advantage of GACO over finite elements is that fewer degrees of freedom are in general required to capture the solution. This is particularly true when the solution has small scale oscillations.

The critical parameters in the Gabor expansion are:

delta_k (grid resolution in k-space)
delta_x (grid resolution in x-space)
sigma   (inverse sqrt(width) of the Gaussian envelopes)

The parameter delta_x is given by the number of Gabor envelopes (Nz+1). Frame theory [1] requires that delta_k * delta_x < 2*pi in order to be able to reconstruct the solution. It is convenient to define the sampling parameter

delta_k * delta_x  = 2*pi/sampling 

Stable reconstruction means sampling > 1. Note that this relation does not depend on sigma. Typically sigma is chosen to be ~1 /delta_x^2. In the limit of sigma -> 0 one recovers FFT. In the limit where sigma -> infinity the Gabors approach Dirac functions. 

A sampling value close to 1 allows to scan the biggest region in k-space and so extract small features most economically. However, larger sampling values confer more stability and robustness to GACO. 


2. Building instructions
------------------------

You need a C and Fortran95 compiler as well as Gnu make (gmake). The code has been developed on a Linux Box but should be portable across UNIX. It compiles with lf95 6.1 (Lahey-Fujitsu) and ifc 7.1 (Intel). 

The following packages are prerequisites:

NetCDF: http://www.unidata.ucar.edu/packages/netcdf/
EZcdf: http://w3.pppl.gov/rib/repositories/NTCC/catalog/Asset/EZcdf.html (used to save/load derived types to and from disk)
SuperLU: http://crd.lbl.gov/~xiaoye/SuperLU/ (used to solve sparse matrix problems)
GSL: http://sources.redhat.com/gsl/ (only required for a test problem)
Python: http://python.org (doc generator)

Optionally (for graphics only):

Plotmtv: http://aixpdslib.seas.ucla.edu/packages/plotmtv.html

To build GACO, the user must provide the location of the above packages as well as the name of the compiler, and so forth, while invoking gmake. This could be either done by editing the Makefile, or by passing arguments to gmake.

To build Gaco with ifc, which produces the fastest code, I type

gmake F90=/usr/local/intel-7.1/compiler70/ia32/bin/ifc NTCC=/usr/ntcc/ifc GSL=/usr/local/gsl-1.4 SUPERLU=/usr/local/superlu CLIB=-lPEPCF90

(You may need to set your LD_LIBRARY_PATH variable: setenv LD_LIBRARY_PATH /usr/local/gsl-1.4/lib)


To build Gaco with the Lahey-Fujitsu compiler, with all debugging switches turned on, I type

gmake F90=/usr/local/lff95/bin/lf95 FOPT="-g --chk a,e,u,s --trace --trap" NTCC=/usr/ntcc/lff95 GSL=/usr/local/gsl-1.4 SUPERLU=/usr/local/superlu

(You may need to set your LD_LIBRARY_PATH variable: setenv LD_LIBRARY_PATH /usr/local/lff95/lib:/usr/local/gsl-1.4/lib)

To clean up the directory, type

gmake reaclean

(will delete *.o *.mod and the executables), or 

gmake clean 

in order to delete only the *.o files.

Here, NTCC refers to the location of the EZcdf library (${NTCC}/lib), setting CLIB is required in order resolve the system library call.

3. Test suite
-------------

Gaco comes with two programs, tester and driver. The first program tortures various modules  while the second solves ODE problems. To get quickly started you may want to take a look at driver.f90.


4. Code organization
--------------------

Gaco is a collection of modules that combine to solve ODEs and PDEs with arbitrary number of components (unknowns) and in arbitrary dimensional space. A module contains a derived type (similar to a struct in C) and subroutines that act on this type (similar to methods in C++). A module is roughly equivalent to a class in object oriented languages. 

For module XYZ, the derived type is typically called XYZ_obj. The first argument in the method call must be an instance of type XYZ_obj. A module XYZ has a constructor (XYZ_init) to allocate and initialize the object XYZ_obj, and a destructor (XYZ_free) to reclaim the allocated memory.

Note that each call to XYZ_init should accompagnied by a call to XYZ_free, or your program will leak memory.

Modules also have a XYZ_test method, which illustrates a typical use of the module (in addition to testing it).

Some modules allow the object to be saved on disk via XYZ_save and retrieved from disk via a XYZ_load call. The parameters are set via 'XYZ_set...' calls and the result of calculations are typically extracted via 'XYZ_get...' calls.

Examples of modules:

indexer (indexer.f90):  Maps a multi-index expansion into an expansion using a single index. 
 
kzgrid (xkzgrid.f90): Given a single index, compute the wave-vector k and the Gabor position.

xgrid (xgrid.f90): Given a single index, compute the collocation position.

operators (operators.f90): Define a set of operators to solve ODEs/PDEs.

gabor (gabor.f90): Computes the Gabor functions in 1 and N-dimensions.

sparseC16 (supralu_mod.f90): Sparse matrix representation.

plotmtv (plotmtv.f90): Some basic plotting utilities. 


5. Documentation
----------------

All modules, routines and types are documented in doc.txt.

A "live" documentation system is available. 

To generate the complete documentation, type

doc --all 

To show the file structure (modules, subroutines, etc), type:

doc --file <file.f90> 

To show the module structure (type member, subroutine methods), type:

doc --module <module_name> <file.f90>

To show a routine (function or subroutine) calling syntax, type:

doc --routine <routine_name> <file.f90>


6. Anatomy of a driver
----------------------

The user is encouraged to take a look into one of the example driver programs, for instance airy.f90 which contains:

MODULE  airy
        TYPE  airy_obj
                SUBROUTINE airy_init(this, Nz, Nj, sigma_dx2, sampling)

                SUBROUTINE airy_free(this)

                SUBROUTINE airy_set_domain(this, xmin, xmax)

                SUBROUTINE airy_set_tol(this, tol)

                SUBROUTINE airy_set_equation(this, alpha)

                SUBROUTINE airy_set_bcs(this, bc0, bc1)

                SUBROUTINE airy_assemble(this)

                SUBROUTINE airy_solve(this)

                SUBROUTINE airy_get_solution(this, x, y)

                SUBROUTINE airy_plot(this, x, y)

                SUBROUTINE airy_test(this)

The derived type captures all the parameters needed to solve the problem. 

  type airy_obj
     integer :: Nz ! No of Gabor intervals
     integer :: Nj ! Fourier modes -Nj .. + Nj
     real(r8) :: sampling  ! sampling rate > 1
     real(r8) :: alpha ! equation parameter
     real(r8) :: tol ! sparsity tolerance
     complex(r8) :: bc0, bc1 ! Boundary conditions
     real(r8) :: xmin, xmax ! domain boundary

     real(r8) :: eps
     real(r8) :: sigma
     integer :: Nx
     type(sparseC16_obj) :: amat
     type(indexer_obj) :: iter2
     type(kzgrid_obj) :: kzg
     complex(r8), dimension(:), pointer :: xvec, bvec

  end type airy_obj

The first group of parameters (Nz -> xmax) define the discretization, geometry, and equation parameters (which vary from problem to problem).  The data members of type airy_obj are populated via

type(airy_obj) :: this
...

SUBROUTINE airy_init(this, Nz, Nj, sigma_dx2, sampling)

and, optionally, via "setters"

SUBROUTINE airy_set_domain(this, xmin, xmax)
SUBROUTINE airy_set_tol(this, tol)
SUBROUTINE airy_set_equation(this, alpha)
SUBROUTINE airy_set_bcs(this, bc0, bc1)


The airy_init method will set reasonable default parameters. Note the parameter "tol", which sets the sparsity tolerance. That is, a matrix element whose absolute value is < tol will not be filled in. Typically, "tol" should be comparable to the overall numerical accuracy that is desired. A larger "tol" will yield sparser matrices and so be more efficient. 

The crucial method is 

SUBROUTINE airy_assemble(this)

which assembles the sparse matrix system. We'll come back to it later. The method

SUBROUTINE airy_solve(this)

should be called after airy_assemble; it solves the matrix system. The result is a vector of (complex) amplitudes. In order to get the solution "y" on an arbitrary set of points "x", you can call 

SUBROUTINE airy_get_solution(this, x, y)

The vector can be plotted using 

SUBROUTINE airy_plot(this, x, y)

When you are finished, just call

SUBROUTINE airy_free(this)

A complete test driver is available in 

SUBROUTINE airy_test(this)


7. Assembly step
----------------

Let's dive into SUBROUTINE airy_assemble. In general a problem will have "Nd" dimensions (e.g. Nd=3 in 3D) and "Nc" components (e.g. Nc=2 for two coupled equations). Unless otherwise specified, we will not assume a fixed "Nd" or "Nc". In "Nd" dimensions, the collocation positions, the Gabor k-vector and the Gabor envelope positions "z" will be vectors of size "Nd". Hence, the expansion of the solution will involve Nd (for k) + Nd (for z) + 1 (vector component) = 2*Nd+1 indices. The sparse matrix on the other hand will require all these indices to collapse to a single global index. The mapping between the 2*Nd+1 indices and the single matrix index is taken care by the "indexer" module.  

Let us be more specific and assume that the same number of Fourier modes holds across the domain. Let the k-wavevector indices be the vector j(1:Nd), the z-position indices be the vector i(1:Nd) and the component index be k. The (Fourier) j indices range from -Nj..+Nj where Nj is an integer vector of size Nd, which is constant across the domain. Likewise, i indices range from 0..Nz where Nz is a vector of size Nd denoting the number of Gabor envelopes in each dimension. (There will be different Nj's and Nz's for each dimension.) In a convential approach, the matrix assmebly would involve looping over each of the 2*Nd+1 indices (leading to 2*Nd+1 nested do-loops). Fortunately, there is a better approach. The user "constructs" an indexer instance using two vectors "sizes" and "starts" that determine the range of the indices. The memeory layout will be determined by th eorder in which these values are fed to "sizes" and "starts" (which should be mutually consistent). For example, if Nd=2 (x, y) and Nc=3 (Ex, Ey, Ez), a possible layout would be

type(indexer_obj) :: iter2 ! somewhere at the top of the unit
integer :: ier ! error flag (/=0 if problem)
...
! set the memory layout
sizes = (/ Nc, 2*Nj(1)+1, 2*Nj(2)+1, Nz(1)+1, Nz(2)+1 /)
starts =(/ 1,   -Nj(1)  ,  -Nj(2)  ,       0,       0 /)

call indexer_init(iter2, sizes, starts, ier)

which would indicate that indices vary fastest in components (Nc components starting at 1), then Fourier mode in x [2*Nj(1)+1 modes starting at -Nj(1)], Fourier mode in y [2*Nj(2)+1 modes starting at -Nj(2)], Gabor position in x [Nx(1)+1 values], and Gabor position in y [Nx(2)+1 values]. Here, "iter2" is the indexer object, a compound type that will take care of updating each index as we loop through the global index to assemble the matrix system. The advantage of using the "iter2" object over hand-coded, do loops are (1) only one level of do-loop is required as opposed to 2*Nd+1 nested do-loops and (2) the way the array of unknown is flattened can be modified simply by changing "sizes" and "starts". In particular, this will allow the user to experiment which layout is numerically more favorable, yielding a more diagonally dominant matrix and ultimately a faster solution for instance, with minimal fuss. Should the user desire to pack the envelope positions together, then the Fourier modes and finally the components, this would be achieved by setting

sizes = (/  Nz(1)+1, Nz(2)+1, 2*Nj(1)+1, 2*Nj(2)+1, Nc /)
starts =(/        0,       0,  -Nj(1)  ,  -Nj(2)  ,  1 /)


Similarly, another indexer object is required to map the collocation positions into a global matrix index. Depending on your preferences, you could choose:

type(indexer_obj) :: iter1 ! somewhere at the top of the unit
...
sizes = (/ Nc, Nx(1)+1, Nx(2)+1 /)
starts =(/ 1, 0, 0 /)
call indexer_init(iter1, sizes, starts, ier)
...


Indices are updated using the indexer_next method:


do i2 = iter2 % ntot
	call indexer_next(iter2, ier) ! update second set of indices
	...
	do i1 = iter1 % ntot
		call indexer_next(iter1, ier) ! update first set of indices
		...

where the data member ntot is just the product of the elements of "sizes". Note that the collocation iteration is inside the Gabor expansion iteration because the matrix elements, as we will see, need to be filled by running down the columns (column compress storage).

We've sen how the intricate nesting of do-loops contracts to two (i1 and i2) single do-loops regardless of "Nd" and "Nc" through the use of the indexer objects. We still need to access each index inside a do-loop in order to determine the k-wavevector, z-position and component inside the i2 iteration, as well as the component and collocation position x inside the i1 iteration. Assuming the grid to be uniform, we can do this by constructing an kzgrid object to map the indices into (k-wavevector, z-position), and a xgrid object to map the indices into x-position, respectively:

type(xgrid_obj)  :: xg
type(kzgrid_obj) :: kzg
integer :: x_map(Nd), kz_map(2*Nd)
...

! maps iter2 indices to k and z indices, resp.
! here assume sizes=(/Nc, 2*Nj(:)+1, Nz(:)+1) /)
do i = 1, Nd
	kz_map(i   ) = i   +1
       	kz_map(i+Nd) = i+Nd+1
end do
call kzgrid_init(kzg, xmin, xmax, Nz, Nj, sampling, kz_map, ier)

! maps indices of iter1 to x-indices
! here assume sizes=(/Nc, Nx(:)+1) /)
do i = 1, Nd
	x_map(i) = i + 1
end do
call xgrid_init(xg, xmin, xmax, Nx, x_map, ier)

Inside the i1 do loop we can now access the k-wavevector (kwav) and z-position (zpos)

	call indexer_get(iter2, all_ind2, ier)
	! all_ind2 contains i(1:Nd) and j(1:Nd) in some order
	call kzgrid_get(this % kzg, all_ind2, kwav, zpos, ier) 

To access the collocation position (xpos) in the i1 iteration we do:

		call indexer_get(iter1, all_ind1, ier)
		! all_ind1 contains el(1:Nd) in some order
          	call xgrid_get(xg, all_ind1, xpos, ier)
	

Neglecting the issue of boundary conditions for the time being, we're now in a position to fill in the matrix elements. As pointed out earlier it is essential to fill these column by column after incrementing each row. Naturally, only non-zero elements need to be stored allowing us to skip some rows. The matrix elements will be stored in a complex*16 sparse matrix compound type 

type(sparseC16_obj) :: amat

which is initialized via

base = 0
call sparseC16_init(amat, rank, nnz, base, ier)

where "rank" is the matrix rank (an integer equal iter2 % ntot above), "nnz" the expected number of non-zero entries (or larger), and "base" should at present be set to zero. With Gaco it may be difficult to estimate "nnz". For small problems that fit in memory you may simply want to set nnz = rank**2. The matrix elements are filled with the value "val" by 

		call sparseC16_set_next(amat, i1-1, newcol_flag, val, ier)

where "i1" is the global row index (the -1 is required because base=0) and newcol_flag a logical variable that should be set to .TRUE. when the column index "i2" changed. 

The complex "val" represents the ODE (PDE) operator applied on a Gabor function with given kwav (k-wavevector), zpos (envelope position) and evaluated at the xpos (collocation position). Derivatives of Gabors can be computed 

gabor_fct = GaborN(xpos, zpos, kwav, sigma)
call derivative(order, xpos, kwav, zpos, sigma, val, ier)
val = val * gabor_fct

where 0 <= order <= 10. GaborN and derivative are defined in the gabor, respectively operators modules. Feel free to add your own operators in the latter.

Boundary conditions need to be applied "by hand" (there is no universal data structure to help) during the assembly step. While it is possible to massage the matrix system after filling in the elements, this is not recommended as this may upset the matrix sparsity patterns. Boundary conditions of the form y^(n)(xbound) = const where n denotes the order of the derivative and xbound is typically either xmin or xmax, are implemented by setting the elements of a row to the nth derivative the Gabor functions g^(n)(k,z; xbound). Formally speaking, any row would do. We should keep in mind, however, that the collocation node associated with the row is effectively be removed by this process.


8. Loose ends
-------------

While GACO has been written with flexibility in mind and most elements are in place to solve equations in higher than one dimensions, all tests conducted so far were in 1D only.

A uniform (x, z) mesh is assumed. However, the Gabor collocation method does not preclude the use of a non-uniform mesh per se. It may be desirable in specific situations to pack the z envelope positions in some regions. Likewise, there could be some benefits in using a variable number of Fourier modes (which would prevent the use of the indexer object), or sigma paramater. For instance, more Fourier modes could be used in regions where the solution is expected to exhibit fine structures. It is also possible to pack the collocation points. In this case new data structures would be required to replace xgrid (when the collocation points are packed) or kzgrid (when the z-positions are packed). 

9. References
-------------

[1] "Ten Lectures on Wavelets" by I. Daubechies. (Cbms-Nsf Regional Conference Series in Applied Mathematics, No 61, May 1992).
[2] Some error convergence tests for the Gabor collocation method are available at http://w3.pppl.gov/~pletzer/wavelets/collocation/convergence/readme.txt