Concurrent Data Streaming.

    
                     by

Kevin Martins -- CPPG/Drexel Co-op Student
                 Undergraduate major in Computer Science

Abstract:

Large scale simulations often take days to run, producing extremely large datasets. Traditionally, the simulation is allowed to complete, before the researcher may begin to ship the output data. Using conventional methods, such as ftp, this transfer of data can often take days, if not weeks, to transfer. The software that we are developing will work to overcome these limitations; drastically reducing the necessary data transfer time. Using routines supplied by Globus, we have written Application Programmable Interfaces (API's) which allow simulations to stream data across from supercomputers to PPPL's local clusters to be written as hdf5 files. These API's are designed to transfer the data without hampering the productivity of supercomputer applications. Several mpi simulations have been run which generate data on a cluster at princeton, and transfer the resulting data to a cluster here at PPPL. These simulations have shown that the data transfer can keep pace with codes which generate one Terabyte of data per day. We are currently in the process of integrating these API's into the GTC code.