|国家预印本平台
首页|A Data-Center FPGA Acceleration Platform for Convolutional Neural Networks

A Data-Center FPGA Acceleration Platform for Convolutional Neural Networks

A Data-Center FPGA Acceleration Platform for Convolutional Neural Networks

来源:Arxiv_logoArxiv
英文摘要

Intensive computation is entering data centers with multiple workloads of deep learning. To balance the compute efficiency, performance, and total cost of ownership (TCO), the use of a field-programmable gate array (FPGA) with reconfigurable logic provides an acceptable acceleration capacity and is compatible with diverse computation-sensitive tasks in the cloud. In this paper, we develop an FPGA acceleration platform that leverages a unified framework architecture for general-purpose convolutional neural network (CNN) inference acceleration at a data center. To overcome the computation bound, 4,096 DSPs are assembled and shaped as supertile units (SUs) for different types of convolution, which provide up to 4.2 TOP/s 16-bit fixed-point performance at 500 MHz. The interleaved-task-dispatching method is proposed to map the computation across the SUs, and the memory bound is solved by a dispatching-assembling buffering model and broadcast caches. For various non-convolution operators, a filter processing unit is designed for general-purpose filter-like/pointwise operators. In the experiment, the performances of CNN models running on server-class CPUs, a GPU, and an FPGA are compared. The results show that our design achieves the best FPGA peak performance and a throughput at the same level as that of the state-of-the-art GPU in data centers, with more than 50 times lower latency.

Dewei Chen、Jie Miao、Yu Meng、Xiaoyu Yu、Ephrem Wu、Heng Zhang、Bo Zhang、Biao Min、Yuwei Wang、Jianlin Gao

10.1109/FPL.2019.00032

微电子学、集成电路计算技术、计算机技术

Dewei Chen,Jie Miao,Yu Meng,Xiaoyu Yu,Ephrem Wu,Heng Zhang,Bo Zhang,Biao Min,Yuwei Wang,Jianlin Gao.A Data-Center FPGA Acceleration Platform for Convolutional Neural Networks[EB/OL].(2019-09-16)[2025-08-02].https://arxiv.org/abs/1909.07973.点此复制

评论