A Systematic Literature Review on Graphics Processing Unit Accelerated Realm of High-Performance Computing
DOI:
https://doi.org/10.47941/ijce.1813Keywords:
Compute Unified Device Architecture, Graphics Processing Unit, High-Performance Computing, Performance Analysis, Radeon Open Compute.Abstract
GPUs (Graphics Processing Units) are widely used due to their impressive computational power and parallel computing ability.It have shown significant potential in improving the performance of HPC applications. This is due to their highly parallel architecture, which allows for the execution of multiple tasks simultaneously. However, GPU computing is synonymous with CUDA in providing applications for GPU devices. This offers enhanced development tools and comprehensive documentation to increase performance, while AMD's ROCm platform features an application programming interface compatible with CUDA. Hence, the main objective of the systematic literature review is to thoroughly analyze and compute the performance characteristics of two prominent GPU computing frameworks, namely NVIDIA's CUDA and AMD's ROCm (Radeon Open Compute). By meticulously examining the strengths, weaknesses, and overall performance capabilities of CUDA and ROCm, a deeper understanding of these concepts is gained and will benefit researchers. The purpose of the research on GPU accelerated HPC is to provide a comprehensive and unbiased overview of the current state of research and development in this area. It can help researchers, practitioners, and policymakers understand the role of GPUs in HPC and facilitate evidence-based decision making. In addition, different real-time applications of CUDA and ROCm platforms are also discussed to explore potential performance benefits and trade-offs in leveraging these techniques. The insights provided by the study will empower the way to make well-informed decisions when choosing between CUDA and ROCm approaches that apply to real-world software.
Downloads
References
Al Sideiri, A., Alzeidi, N., Al Hammoshi, M., Chauhan, M. S., & AlFarsi, G. (2020). CUDA implementation of fractal image compression. Journal of Real-Time Image Processing, 17, 1375-1387.
Andreadis, K. (2024). AMD MI300X vs NVIDIA H100
Austin, J., Corrales-Fatou, R., Wyetzner, S., & Lipson, H. (2019). Titan: A Parallel Asynchronous Library for Multi-Agent and Soft-Body Robotics using NVIDIA CUDA. arXiv preprint arXiv:1911.10274.
Bahri, H., Chouchene, M., Sayadi, F. E., & Atri, M. (2020). Real-time moving human detection using HOG and Fourier descriptor based on CUDA implementation. Journal of Real-Time Image Processing, 17, 1841-1856.
Bai, Y., Liu, Q., Wu, W., & Feng, Y. (2021). cuSCNN: A Secure and Batch-Processing Framework for Privacy-Preserving Convolutional Neural Network Prediction on GPU. Frontiers in Computational Neuroscience, 15, 799977.
Bruhn, F. C., Tsog, N., Kunkel, F., Flordal, O., & Troxel, I. (2020). Enabling radiation tolerant heterogeneous GPU-based onboard data processing in space. CEAS Space Journal, 12(4), 551-564.
Cao, K., Wu, Q., Wang, L., Guo, H., Wang, N., Cheng, H., . . . Wu, H. (2024). GPU-HADVPPM4HIP V1. 0: higher model accuracy on China's domestically GPU-like accelerator using heterogeneous compute interface for portability (HIP) technology to accelerate the piecewise parabolic method (PPM) in an air quality model (CAMx V6. 10). Geoscientific Model Development Discussions, 2024, 1-22.
Delmas, V., & Soulaïmani, A. (2022). Multi-GPU implementation of a time-explicit finite volume solver using CUDA and a CUDA-Aware version of OpenMPI with application to shallow water flows. Computer Physics Communications, 271, 108190.
Hashmi, M. F., Ayele, E., Naik, B. T., & Keskar, A. G. (2022). A parallel computing framework for real-time moving object detection on high resolution videos. Journal of Intelligent Information Systems, 1-22.
Hecht, H., Brendel, E., Wessels, M., & Bernhard, C. (2021). Estimating time-to-contact when vision is impaired. Scientific Reports, 11(1), 21213.
HOMERDING, B., & TRAMM, J. EVALUATING THE PERFORMANCE OF THE HIPSYCL TOOLCHAIN FOR HPC KERNELS ON NVIDIA V100 GPUS.
JUNE 2023. (2023).
Kim, B., Yoon, K. S., & Kim, H.-J. (2021). GPU-Accelerated Laplace Equation Model Development Based on CUDA Fortran. Water, 13(23), 3435.
Lai, J., Yu, H., Tian, Z., & Li, H. (2020). Hybrid MPI and CUDA parallelization for CFD applications on multi-GPU HPC clusters. Scientific Programming, 2020, 1-15.
Leinhauser, M., Widera, R., Bastrakov, S., Debus, A., Bussmann, M., & Chandrasekaran, S. (2022). Metrics and design of an instruction roofline model for AMD GPUs. ACM Transactions on Parallel Computing, 9(1), 1-14.
Lin, D.-L., Ren, H., Zhang, Y., Khailany, B., & Huang, T.-W. (2022). From RTL to CUDA: A GPU Acceleration Flow for RTL Simulation with Batch Stimulus.
NVIDIA. (2024). NVIDIA H100 Tensor Core GPU.
Otterness, N., & Anderson, J. H. AMD GPUs as an Alternative to NVIDIA for Supporting Real-Time Workloads.
Pavlidakis, M., Mavridis, S., Chazapis, A., Vasiliadis, G., & Bilas, A. (2023). Arax: A Runtime Framework for Decoupling Applications from Heterogeneous Accelerators. arXiv preprint arXiv:2305.01291.
Vanderbauwhede, W., & Takemi, T. Twinned buffering: A simple and highly effective scheme for. Red, 500(600), 700.
Yu, C., Royuela, S., & Quiñones, E. OpenMP to CUDA graphs: a compiler-based transformation to enhance the programmability of NVIDIA devices.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 Rajat Suvra Das, Vikas Gupta
This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution (CC-BY) 4.0 License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.