6月17日 王言治:5,000X model compression in DNNs; But, is it truly desirable?

时间:2019-06-09浏览:302设置


讲座题目:5,000X model compression in DNNs; But, is it truly desirable?

主讲人:王言治  助理教授

主持人:石亮  教授

开始时间:2019-06-17 11:00:00  结束时间:2019-06-17 12:00:00

讲座地址:数学馆201报告厅

主办单位:计算机科学与软件工程学院

  

报告人简介:

Yanzhi Wang is currently an assistant professor in the Department of Electrical and Computer Engineering at Northeastern University. He has received his Ph.D. Degree in Computer   Engineering from University of Southern California (USC) in 2014, and his B.S. Degree with Distinction in Electronic Engineering from Tsinghua   University in 2009. Dr. Wang's current research interests are the energy-efficient and high-performance implementations of deep learning and artificial intelligence systems, as well as the integration of security protection in deep learning systems. His works have been published in top venues in conferences and journals (e.g. ASPLOS, ISCA, MICRO, HPCA, ISSCC,  AAAI, ICML, CVPR, ICLR, IJCAI, ECCV, ACM MM, CCS, VLDB, FPGA, DAC, ICCAD,  DATE, LCTES, INFOCOM, ICDCS, Nature SP, etc.), and have been cited for over 4,200 times according to Google Scholar. He has received four Best Paper Awards, has another seven Best Paper Nominations and two Popular Papers in IEEE TCAD. His group is sponsored by the NSF, DARPA, IARPA, AFRL/AFOSR, SRC,  and industry sources.


报告内容:

Hardware implementation of deep neural networks (DNNs) with emphasis on performance and energy efficiency has been the focus of extensive ongoing investigations. When large DNNs are mapped to hardware as an inference engine, the resulting hardware suffers from   significant performance and energy overheads. To overcome this hurdle, we develop ADMM-NN, an algorithm-hardware co-optimization framework for greatly reducing DNN computation and storage requirements by incorporating   Alternating Direction Method of Multipliers (ADMM) and utilizing all redundancy sources in DNN. Our preliminary results show that ADMM-NN can   achieve the highest degree of model compression on representative DNNs. For   example, we can achieve 348X, 63X, 34X, and 17X weight reduction on LeNet-5,   AlexNet, VGGNet, and ResNet-50, respectively, with (almost) no accuracy loss. We achieve a maximum of 4,438X weight data storage reduction when combining  weight pruning and weight quantization, while maintaining accuracy. However, a second problem arises. The needs for index storage is even higher compared with weight value storage, especially when weight quantization is in place.   Then the question is: is it possible that incorporating structures   in weight pruning results in even loss storage compared with the general, non-structured pruning? If the answer is yes, then whether non-structured pruning is still a viable approach at all? Through extensive investigation, we found that the answer is yes for most cases under the same accuracy. As a   result, we recommend not to continue DNN inference engines based on non-structured sparsity.

  


返回原图
/