光电工程, 2016, 43 (2): 69, 网络出版: 2016-03-23
GPU 平台二维快速傅里叶变换算法实现及应用
Realization and Application of Two-dimensional Fast Fourier Transform Algorithm Based on GPU
摘要
NVIDIA 在其GPU 平台上开发的FFT 库CUFFT 经过几次升级,但在二维FFT 实现上效率还有提升空间,而且对于特定不能与上下文的计算融合,导致多次对Global memory 的访问。本文分析合并内存访问事务大小与占用率之间的关系,优化使用GPU 存储器资源,对小数据量2 次幂二维复数FFT 在GPU 上的实现进行改进,加速比最高达到CUFFT 6.5 的1.27 倍。利用实数FFT 结果的共轭对称性,算法的效率比复数FFT 算法运算量降低了40%。最后将FFT 的改进应用到光学传递函数(OTF)的计算中,采用Kernel 融合的方法,使得OTF 的计算效率比CUFFT 计算方法提高了1.5 倍。
Abstract
NVIDIA as the inventor of the GPU provides a library function CUFFT for computing Fast Fourier Transform (FFT). After several generations update of CUFFT, there is still promotion space and it is not suit for kernel fusing on GPU to reduce the memory access and increase the Instruction Level Parallelism (ILP). We develop our own custom GPU FFT implementation based on the well-known Cooley-Tukey algorithm. We analyze the relationship of coalesce memory access and occupancy of GPU and get the optimal configuration of thread block. The results show that the proposed method improved the computational efficiency by 1.27 times than CUFFT 6.5 for double complex data 512×512. And then it is used to the computation of OTF with kernel fusing strategy, and it improved the efficiency of computation about 1.5 times than conventional method using CUFFT.
张全, 鲍华, 饶长辉, 彭真明. GPU 平台二维快速傅里叶变换算法实现及应用[J]. 光电工程, 2016, 43(2): 69. ZHANG Quan, BAO Hua, RAO Changhui, PENG Zhenming. Realization and Application of Two-dimensional Fast Fourier Transform Algorithm Based on GPU[J]. Opto-Electronic Engineering, 2016, 43(2): 69.