光学 精密工程, 2020, 28 (5): 1212, 网络出版: 2020-11-06
基于高层次融合的卷积神经网络FPGA硬件加速
FPGA-based hardware acceleration for CNNs developed using high-Level synthesis
深度学习 现场可编程门阵列 高层次融合 硬件加速电路 deep learning Field Programmable Gate Array (FPGA) high level synthesis hardware acceleration circuits
摘要
为了解决神经网络前向传播过程中的硬件加速问题, 设计了一套基于FPGA编程工具Vivado HLS开发的AlexNet神经网络前向传播硬件加速系统。该系统能够确保在达到相关应用要求的基础上, 有效地节省开发时间并降低开发成本。系统基于高级计算机语言C++进行FPGA电路的仿真与开发, 同时, 灵活运用具有很高便捷性及可靠性的Vivado HLS中的PIPELINE和ARRAY_PARTITION指令进行系统优化。实验结果表明, AlexNet神经网络在本文所构建的FPGA加速系统上的运行时间为21.95 ms, 比在传统GPU平台上的运行时70 ms少, 运行速度要3倍以上。此外, 每一层的网络都实现了分开封装操作, 使系统可便捷地移植到其它成熟的卷积神经网络上, 加速了深度学习在各类人工智能系统上的应用, 在智能产业具有广泛的应用价值。
Abstract
To accelerate the forward-propagation process of deep-learning networks, a field-programmable gate array (FPGA) hardware-acceleration system for AlexNet was developed using Vivado High-Level Synthesis (HLS), which can greatly reduce the FPGA development cost. Using Vivado HLS, developers can design hardware architectures on an FPGA platform using C/C++ code instead of a hardware-description language. We implemented AlexNet on an FPGA platform using the HLS tool, and then used the PIPELINE and ARRAY_PARTITION directives to optimize the proposed system. An evaluation of the proposed system shows that its performance is three times better than a traditional computing-platform graphics processing unit (GPU). In the future, owing to the high-level encapsulation, the developed system can be easily transformed into other convolutional neural networks for practical operation, which shows its great portability and practical application value.
魏楚亮, 陈儒林, 高谦, 孙正隆. 基于高层次融合的卷积神经网络FPGA硬件加速[J]. 光学 精密工程, 2020, 28(5): 1212. WEI Chu-liang, CHEN Ru-lin, GAO Qian, SUN Zheng-long. FPGA-based hardware acceleration for CNNs developed using high-Level synthesis[J]. Optics and Precision Engineering, 2020, 28(5): 1212.