As a project in G8 Research Councils Initiative on Exascale Simulations, we have been performing an international collaboration named NuFuSE (Nuclear Fusion Simulation toward Exascale) in past two and half years. One of our collaboration between University of Tsukuba and PPPL is to run the GTC-P code on large scale GPU clusters in several language approach.Currently in CCS (Center for Computational Sciences) at University of Tsukuba, we have been developing a large scale GPU cluster named HA-PACS with 802 TFLOPS peak performance plus a specially enhanced additional 330 TFLOPS supported by a special hardware/software feature named TCA (Tightly Coupled Accelerators) Architecture. TCA is a novel solution to provide very fast and low latency GPU-GPU direct communication to speed-up various scientific codes.
On the other hand, we have been developing a PGAS language named XcalableMP-dev (XMP-dev) which provides features to describe large scale scientific codes in easy and suitable way for large scale GPU clusters. Our new challenge here is to apply TCA feature directly to XMP-dev language to enable users easily and smoothly program large scale data-parallel codes to run on HA-PACS/TCA.
GTC-P is selected as the first practical application code to run on this framework. At this moment, we started to port GTC-P to XMP basic language without GPU feature, then we will put additional description to run it on XMP-dev for the execution on HA-PACS. Finally, when XMP-dev/TCA is ready to run, the GTC-P code will automatically run on HA-PACS with full feature of TCA. This study is still on the way, but I like to introduce current status and near future plan of it.