XGC1 is a particle-in-cell code including gyrokinetic ions and drift kinetic electrons, which typically uses 5000 particles per cell and the total number of particles exceeds 20 billion. Electron sub-cycling method was developed to push electron multiple steps for each ion push, which takes up most of the computing time (>90%). These computationally heavy pushing subroutines are good fits to the recently developed General Purpose Graphics Processing Unit (GPGPU) technologies. XGC1 code was ported on the GPU based TITAN supercomputer using CUDA FORTRAN.
In this talk, I will present the CUDA FORTRAN implementation and optimization in XGC1 and show its performance improvement in the GPU-CPU hybrid architecture.