姓名:张俸玺 学号:20012100022 学院:竹园三号书院
转自https://blog.csdn.net/qq_38798425/article/details/109124572
【嵌牛导读】FPGA,可编程门阵列,作为一种较为新型的技术,为大多数人所陌生。如今,FPGA成为一个技术热门。FPGA的神经网络实现是当今世界的热门技术话题之一。本文是对FPGA+CNN论文分类的整理。
【嵌牛鼻子】FPGA CNN 论文分类
【嵌牛提问】FPGA+CNN论文如何分类?
【嵌牛正文】
快速计算
快速计算分为两种,一种是利用快速算法,快速算法主要是FFT和Winograd算法,能够加快卷积运算。其中FFT更适合于卷积核较大的情况下,Winograd更适合硬件平台部署,这里的详细分析可以移步知乎。另一种是合理利用计算资源(DSP单元)。
FFT:
[2013]-Fast Training of Convolutional Networks through FFTs
[2016]-Very Efficient Training of Convolutional Neural Networks using Fast Fourier Transform and Overlap-and-Add
[2017]-Frequency Domain Acceleration of Convolutional Neural Networks on CPU-FPGA Shared Memory System
[2018]-A Framework for Generating High Throughput CNN
[2017]-Evaluating Fast Algorithms for Convolutional Neural Networks on FPGAs这里面winograd 和FFT都有涉及。
Winograd:
[1980]- Multiplication of Polynomials Modulo a Polynomial
[2016]-Fast Algorithms for Convolutional Neural Networks
[2018]-Towards a Uniform Template-based Architecture for Accelerating 2D and 3D CNNs on FPGA
[2018]-A High-efficiency FPGA-based Accelerator for Convolutional Neural Networks using Winograd Algorithm
[2018]-A Novel Low-Communication Energy-Efficient Reconfigurable CNN Acceleration Architecture
[2019]Towards an Efficient Deep Pipelined Template-Based Architecture for Accelerating the Entire 2D and 3D CNNs on FPGA(其实这个就是上面那个会议论文扩的期刊)
[2019]Accelerating 3D CNN-based Lung Nodule Segmentation on a Multi-FPGA System
[2020]-Stride 2 1-D, 2-D, and 3-D Winograd for Convolutional Neural Network
[2020]-A Power-Efficient Optimizing Framework FPGA Accelerator Based on Winograd for YOLO
DSP复用
[2017]-Double MAC Doubling the Performance of Convolutional
以及上文扩的期刊
[2019]-A High Throughput Acceleration for Hybrid Neural Networks With Efficient Resource Management on FPGA
乘法转移位
[2017]Shift: A Zero FLOP, Zero Parameter Alternative to Spatial Convolutions
[2019]-Synetgy: Algorithm-hardware Co-design for ConvNet Accelerators on Embedded FPGAs
谱卷积
[2019]-A Flexible Design Automation Tool for Accelerating Quantized Spectral CNNs
[2020]-Reuse Kernels or Activations? A Flexible Dataflow for Low-latency Spectral CNN Acceleration
GEMM
[2016]-Fast algorithms for convolutional neural
networks
[2019]-High-Performance CNN Accelerator on FPGA Using Unified Winograd-GEMM Architecture这篇论文把GEMM和Winograd做了个结合
其他
乘加顺序变化来减小计算量
[2020]-Sparse-YOLO: Hardware/Software Co-Design of an FPGA Accelerator for YOLOv2
浮点数分块计算
[2018]-Reconfigurable Acceleration of 3D-CNNs for Human Action Recognition with Block Floating-Point Representation
工程应用
工程应用这块儿文章不少,这里列举一部分年限比较近的
[2019]-A Simplified Speaker Recognition System Based on FPGA Platform盲源分离
[2019]-A Real-Time Convolutional Neural Network for Super-Resolution on FPGA With Applications to 4K UHD 60 fps Video Services视频
[2019]-Acceleration of FPGA Based Convolutional Neural Network for Human Activity Classification Using Millimeter-Wave Radar毫米波雷达
[2019]-Towards an Efficient Accelerator for DNN-based Remote Sensing Image Segmentation on FPGAs分割
[2020]-Deep Learning Approach for Epileptic Focus Localization癫痫病灶
[2020]-On the Use of FPGAs to Implement CNNs: A Brief Review 这篇review里面列举了挺多应用
编译器
现在很多文章的一个主要贡献就是做编译器,FPGA的编程说实话确实很难,尤其是有些时候需要做很多优化HLS不能用,因此编译器就显得很重要。
[2016]-Scalable and modularized RTL compilation of Convolutional Neural Networks onto FPGA
[2019]-TensorFlow to Cloud FPGAs_ Tradeoffs for Accelerating Deep Neural Networks
[2020]-Automatic Compilation of Diverse CNNs Onto High-Performance FPGA Accelerators
[2020]-A Novel FPGA Accelerator Design for Real-Time and Ultra-Low Power Deep Convolutional Neural Networks Compared With Titan X GPU
[2020]-End-to-End Optimization of Deep Learning Applications
模型结构
[2017]-Accelerating Binarized Convolutional Neural Networks with Software-Programmable FPGAs
[2018]-FBNA_ A Fully Binarized Neural Network Accelerator
[2018]-Towards Efficient Convolutional Neural Network for Domain-Specific Applications on FPGA
[2019]-Synetgy: Algorithm-hardware Co-design for ConvNet Accelerators on Embedded FPGAs
[2019]-LUTNet_ Rethinking Inference in FPGA Soft Logic这篇文章的思想非常值得读
硬件架构
[2017]-Optimizing Loop Operation and Dataflow in FPGA Acceleration of Deep Convolutional Neural Networks
[2018]-cascade cnn_ pushing the performance limits of quantisation in convolutional neural networks
[2018]-RNA_ An Accurate Residual Network Accelerator for Quantized and Reconstructed Deep Neural Networks
[2018]-A CNN Accelerator on FPGA Using Depthwise Separable Convolution
[2018]-ANgel-Eye_ A complete Design Flow for Mapping CNN Onto Embedded FPGA
[2019]-MulNet: A Flexible CNN Processor With Higher Resource Utilization Efficiency for Constrained Devices
[2019]-A High Throughput Acceleration for Hybrid Neural Networks With Efficient Resource Management on FPGA
[2020]-A Novel FPGA Accelerator Design for Real-Time and Ultra-Low Power Deep Convolutional Neural Networks Compared With Titan X GPU
[2020]-LACS: A High-Computational-Efficiency Accelerator for CNNs
[2020]-Light-OPU: An FPGA-based Overlay Processor for Lightweight Convolutional Neural Networks
[2020]-Reuse Kernels or Activations? A Flexible Dataflow for Low-latency Spectral CNN Acceleration
[2020]-End-to-End Optimization of Deep Learning Applications
[2020]-Automatic Compilation of Diverse CNNs Onto High-Performance FPGA Accelerators
[2020]-Performance Modeling for CNN Inference Accelerators on FPGA
训练
有的论文做了训练,不过这部分非常少见。虽然训练的计算也没有很复杂,仍然是矩阵乘法,但是需要的资源占用是非常多的 。尤其是很多前向传播的优化过程中为了复用存储空间,将feature map释放掉了,这种显然没有办法再做训练了(所以说都是trick嘛)。
[2019]-Automatic Compiler Based FPGA Accelerator for CNN Training
[2019]-FPGA-based Training Accelerator Utilizing Sparseness of Convolutional Neural Network