Programs for Programmers

HMPP Preprocessor

HMPP is a source-to-source translator that utilizes a small set of OpenMP-like extensions which are added to the source code to identify Fortran subroutines that can execute in parallel. Once identified, these subroutines can be explicitly offloaded to run in parallel on the GPU while the Fortran program is running on the host. During the translation stage, HMPP automatically splits the source code into standard Fortran source and source designed to be offloaded to a GPU. Special drivers and product integration make HMPP usage largely transparent when used in conjunction with Absoft Pro Fortran for Windows and Linux. After translation is complete, the Fortran source is compiled by Absoft Pro Fortran and code for GPUs such as NVIDIA's CUDA is compiled by the data-parallel CUDA compiler (included). These two components are then automatically linked into a single, highly optimized, multi-threaded executable. When run on CUDA GPU enabled hardware, significant speed increases are possible. On non-GPU systems, the code runs in the standard manner. This programming model allows software assets to utilize a single source tree and preserve portability and hardware interoperability,

The directives used to identify parallel code segments are well defined, open source and provide maximum flexibility because they allow tuning any part of the application and preserve legacy codes. They also allow the application to scale by fully leveraging stream and vector units and dynamically adapting to execution on multi-GPU systems. HMPP also allows instantaneous prototyping and hardware performance evaluation of critical functions during the development cycle.

A small example program, which illustrates the HMPP CUDA GPU extensions is shown below. The graph on the right illustrates performance increase.

HMPP CUDA GPU preprocessor

Based on C or FORTRAN directives,HMPP provides a high level abstraction for hybrid programming and includes powerful data-parallel back-ends for NVIDIA CUDA that drastically reduces development time. The HMPP runtime ensures application deployment on multi-GPU systems. Software assets are kept independent from both hardware platforms and commercial software. While preserving portability and hardware interoperability, HMPP increases application performance and development productivity.