This page provides helpful hints for using NCL efficiently and effectively. This is not a complete and comprehensive usage FAQ, but it lists some very important issues that users should be aware of when using NCL. A general working knowledge of the NCL syntax is required for this page.
For some NCL usage examples, see the NCL scientific tutorial and the "pick-a-utility" section of the Quick Start Guide.
There are many features in NCL that when not used -- or used improperly -- result in inefficient data processing. Many of these problems result from the fact that NCL is an interpreted language and must do additional processing for each statement, subscript, and expression executed. Therefore a good rule of thumb is to strive to reduce the number of statements, subscripts, and expressions executed.
The most important fundamental to remember is that NCL is more efficient when NCL's array operations are used. Array operations mean an entire array is used rather than a subscripted element of the array. Consider multiplying two 2-dimensional floating point arrays dimensioned 100x100. One way would be to loop from 0-99 for both dimensions and subscripting individual elements of each array, multiplying them together and assigning them to a result.
do i = 0,99 do j = 0,99 c(i,j) = a(i,j)*b(i,j) end do end do
Although intuitive for most programmers, this is the least efficient way to do this in NCL. All of NCL's algebraic operators allow arrays of similar dimensions and types to be used as operand. For example, the above loop could be rewritten as follows:
c = a*bBy counting each each statement, subscript, and expression in the original loop, you would get three subscripts, one statement, and one expression. These values must be multiplied by 10000 since there are that many iterations, bringing the totals to 30000 subscripts, 10000 statements, and 10000 expressions. The second example has one statement and one expression. By removing unnecessary loops and using NCL's array syntax in NCL source, very large performance gains can be made.
Sometimes it is impossible to completely remove loops. When this is the case, attention should be focused on removing scalar expressions or unnecessary expressions and statements inside of loops. Consider the following example:
do i = 0,999 do j = 0,999 T(i,j) = 100.0 - 8 * sqrt(i^2 + j^2) end do end do
Using the counting scheme outlined above, this loop has one subscript, one statement, and six expressions (the function sqrt is counted as an expression). These must be multiplied by 1000000 iterations. Now, using a little algebra and knowing that the operators and functions can accept entire arrays as well as individual elements, the above loops can be rewritten as:
do i = 0,999 do j = 0,999 T(i,j) = i^2 + j^2 end do end do T = 100 - 8 * sqrt(T)This version of the loops contains one subscript, one statement, and three expressions times 1000000 iterations. The additional line adds merely three total expressions, making the total just slightly over half the original. In this example, operations common to all elements of the array T were moved outside the loop. By the way, the above loops are not actually necessary. They can be completely removed by initializing T using ispan. This is left as an exercise for the reader.
Avoid using new to create variables unless it is absolutely necessary. Consider the following source:
T = new(filevardimsizes(file1,"T"),float) . . . T = file1->T
This causes an extra array the size of variable T to be unnecessarily allocated. The file variable reference, file1->T, needs to allocate an array the size of T in order to read it from the file. At this point there are two arrays. During the assignment, one array is copied to the other and then freed. This is unnecessary since NCL will implicitly define variables when they appear on the left side of an assignment. Therefore the above statements should be written as:
T = file1->T
This way the temporary value allocated by the file read is merely assigned to the variable T. In this way only one allocation occurs. One example of when this is unavoidable is when reading from multiple files into one array. The following is an example of that:
filenames = (/"a.nc","b.nc","c.nc","e.nc"/) file1 = addfile(filenames(0),"r") dims = filevardimsizes(file1,"T") T = new((/dimsizes(filenames),dims(0),dims(1),dims(2)/),float) T(0,:,:,:) = file1->T do i = 1,dimsizes(filenames)-1 file1 = addfile(filenames(i),"r") T(i,:,:,:) = file1->T end do
NCL has a very powerful variable subscripting syntax with features not found in other languages. Learning when to use these features can be critical for writing efficient NCL source.
Consider the operation of transposing a two-dimensional array. One could write the following inefficient source;
dims = dimsizes(T) Ttranspose = new((/dims(1),dims(0)/),float) do i = 0, dims(1)-1 do j= 0, dims(0)-1 Ttranspose(i,j) = T(j,i) end do end doAfter taking the time to learn learn NCL Named Subscripting, the above can be rewritten in just three lines of source:
T!0 = "x" T!1 = "y" Ttranspose = T( y | :, x | :)
Consider the operation of reversing the order of a dimension:
dims = dimsizes(T) do i = 0,dims(1)/2 -1 tmp = T(:,i) T(:,i) = T(:,(dims(1)-1) - i) T(:,(dims(1)-1) - i) = tmp end doThe above operation can be rewritten using only one line of NCL source:
T = T(:,::-1)It pays to take the time to learn NCL's unique syntax.
Disk I/O is usually much slower than memory accesses, therefore it is recommended that searches or access to individual elements of arrays in files be reduced to improve performance. For example, the first script will run much faster than the second because the search variable is not read from disk on every iteration of the loop. The following file "sao.cdf" contains a two-dimensional character array called "id" in which the three-character station ids are stored. The loops look for the index of a specific id.
First loop: file1 = addfile("sao.cdf","r") id = file1->id do i = 0, dimsizes(id(:,0)) - 1 if( id(i,:) .eq. "DEN") break end if end do print(i) Second loop: file1 = addfile("sao.cdf","r") do i = 0, dimsizes(file1->id(:,0)) - 1 if( file1->id(i,:) .eq. "DEN") break end if end do print(i)
Each supported data format may have its own limitations with respect to operations provided by NCL. See the Supported data format information for specific information, conventions, and limitations on each data format in NCL. The following passages are recommendations for using files and working with them.
If you don't know the names of the variables in a file, use the function getfilevarnames to achieve this. The syntax for using a string to read a variable is:
names = getfilevarnames(file1) var0 = file1->$names(i)$
The '$' operator tells NCL to use the string values between the '$'s as the name of the variable to read. Similarly, attributes and coordinate variables can be read in this fashion.
It is often necessary to know the dimension sizes of variables in files. Never use the dimsizes function for this. This will cause the entire variable to be read in and then discarded. Use the filevardimsizes function for this.
As of release 4.1, some new procedures have been added to better support writing files efficiently. NetCDF has some strange performance problems stemming from how the data are arranged on disk. Attributes and ancillary information, if written after data, can cause a tremendous amount of file copies which slow NCL down. By pre-defining the dimensions, attributes, and variables, much time can be saved. Essentially the rule of thumb is to define and write the data in the order it will be laid out in the file. The four procedures are filedimdef, filevardef, filevarattdef, fileattdef. The basic strategy is to first define all the dimensions and then all the variables in the order they'll be written. If at all possible, define one dimension as unlimited. The unlimited dimension initially has no size, meaning any variables added to the file containing the unlimited dimension will also have no size. Variables with no size can be added without incurring a file copy. After adding each variable, add its attributes. By proceeding in this fashion, the file copies are minimized, and when eventual assignment to the variables occurs, file copies again are minimized.
Because functions and procedures can only be defined once, it is important not to put them in scripts that are intended to be run multiple times from the same invocation of NCL. As of release 4.1, functions and procedures can be undefined using the undef procedure. When writing functions and procedures that will be loaded, it is probably a good idea to insert an undef call immediately before the definition.
undef("mymax") function mymax(a,b) begin . . .
When writing scripts, it can be very useful to enclose the script in a block. This forces NCL to scan in the entire source of the script before executing any of the commands nested in the block. This means any syntax errors will be reported before commands are executed. For example, by placing a begin and end around the following, the syntax error after the long loop is detected before the loop is executed.
begin tmp = new((/1000000/),float) do i = 0,999999 tmp(i) = i end do asciiwrite("tmp.ascii",tmp(500000:) end
NG4.1 Home, Index, Examples, Glossary, Feedback, Ref Contents, Ref WhereAmI?