A revisit to MacKay algorithm and its application to deep network compression

Chune LI , Yongyi MAO , Richong ZHANG , Jinpeng HUAI

Front. Comput. Sci. ›› 2020, Vol. 14 ›› Issue (4) : 144304

PDF (473KB)
Front. Comput. Sci. ›› 2020, Vol. 14 ›› Issue (4) : 144304 DOI: 10.1007/s11704-019-8390-z
RESEARCH ARTICLE

A revisit to MacKay algorithm and its application to deep network compression

Author information +
History +
PDF (473KB)

Abstract

An iterative procedure introduced in MacKay’s evidence framework is often used for estimating the hyperparameter in empirical Bayes. Together with the use of a particular form of prior, the estimation of the hyperparameter reduces to an automatic relevance determination model, which provides a soft way of pruning model parameters. Despite the effectiveness of this estimation procedure, it has stayed primarily as a heuristic to date and its application to deep neural network has not yet been explored. This paper formally investigates the mathematical nature of this procedure and justifies it as a well-principled algorithm framework, which we call the MacKay algorithm. As an application, we demonstrate its use in deep neural networks, which have typically complicated structure with millions of parameters and can be pruned to reduce the memory requirement and boost computational efficiency. In experiments, we adopt MacKay algorithm to prune the parameters of both simple networks such as LeNet, deep convolution VGG-like networks, and residual netowrks for large image classification task. Experimental results show that the algorithm can compress neural networks to a high level of sparsity with little loss of prediction accuracy, which is comparable with the state-of-the-art.

Keywords

deep learning / MacKay algorithm / model compression / neural network

Cite this article

Download citation ▾
Chune LI, Yongyi MAO, Richong ZHANG, Jinpeng HUAI. A revisit to MacKay algorithm and its application to deep network compression. Front. Comput. Sci., 2020, 14(4): 144304 DOI:10.1007/s11704-019-8390-z

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

Li C, Mao Y, Zhang R, Huai J. On hyper-parameter estimation in empirical Bayes: a revisit of the MacKay algorithm. In: Proceedings of the 32nd Conference on Uncertainty in Artificial Intelligence. 2016, 477–486

[2]

LeCun Y, Bengio Y, Hinton G. Deep learning. Nature, 2015, 521(7553): 436

[3]

Mnih V, Kavukcuoglu K, Silver D, Rusu A A, Veness J, Bellemare M G, Graves A, Riedmiller M, Fidjeland A K, Ostrovski G, . Human-level control through deep reinforcement learning. Nature, 2015, 518(7540): 529

[4]

Bishop C M. Pattern Recognition and Machine Learning. Springer, New York, 2016

[5]

MacKay D J. The evidence framework applied to classification networks. Neural Computation, 1992, 4(5): 720–736

[6]

MacKay D J, Neal R M. Automatic relevance determination for neural networks. Technical Report in Preparation, Cambridge University, 1994

[7]

MacKay D J. Probable networks and plausible predictions: a review of practical Bayesian methods for supervised neural networks. Network Computation in Neural Systems, 1995, 6(3): 469–505

[8]

Bishop C M. Bayesian PCA. In: Proceedings of the 11th International Conference on Neural Information Processing Systems. 1999, 382–388

[9]

Tipping M E. Sparse Bayesian learning and the relevance vector machine. The Journal of Machine Learning Research, 2001, 1: 211–244

[10]

Tan V Y, Févotte C. Automatic relevance determination in nonnegative matrix factorization. In: SPARS’09-Signal Processing with Adaptive Sparse Structured Representations. 2009

[11]

MacKay D J. Bayesian interpolation. Neural Computation, 1992, 4(3): 415–447

[12]

MacKay D J. A practical Bayesian framework for backpropagation networks. Neural Computation, 1992, 4(3): 448–472

[13]

Solomon J. Numerical Algorithms: Methods for Computer Vision, Machine Learning, and Graphics. CRC Press, 2015

[14]

Murphy K P. Machine Learning: A Probabilistic Perspective. MIT Press, 2012

[15]

Cho K, Van Merriënboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y. Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. 2014, 1724–1734

[16]

Kim Y. Convolutional neural networks for sentence classification. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing. 2014, 1746–1751

[17]

Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks. In: Proceedings of the 25th International Conference on Neural Information Processing Systems. 2012, 1097–1105

[18]

Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. 2014, arXiv preprint arXiv:1409.1556

[19]

He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016, 770–778

[20]

Tran D, Bourdev L, Fergus R, Torresani L, Paluri M. Learning spatiotemporal features with 3D convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision. 2015, 4489–4497

[21]

Srivastava N, Mansimov E, Salakhudinov R. Unsupervised learning of video representations using LSTMs. In: Proceedings of the International Conference on Machine Learning. 2015, 843–852

[22]

Deng L, Yu D. Deep learning: methods and applications. Foundations and Trends in Signal Processing, 2014, 7(3–4): 197–387

[23]

Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, Berg A C, Fei-Fei L. ImageNet large scale visual recognition challenge. International Journal of Computer Vision, 2015, 115(3): 211–252

[24]

Han S, Mao H, Dally W J. Deep compression: compressing deep neural networks with pruning, trained quantization and huffman coding. 2015, arXiv preprint arXiv:1510.00149

[25]

Li H, Kadav A, Durdanovic I, Samet H, Graf H P. Pruning filters for efficient convnets. 2016, arXiv preprint arXiv:1608.08710

[26]

Liu Z, Li J, Shen Z, Huang G, Yan S, Zhang C. Learning efficient convolutional networks through network slimming. In: Proceedings of the IEEE International Conference on Computer Vision. 2017, 2755–2763

[27]

Louizos C, Welling M, Kingma D P. Learning sparse neural networks through l0 regularization. In: Proceedings of International Conference on Learning Representations. 2018

[28]

Molchanov D, Ashukha A, Vetrov D. Variational dropout sparsifies deep neural networks. In: Proceedings of the International Conference on Machine Learning. 2017, 2498–2507

[29]

Neklyudov K, Molchanov D, Ashukha A, Vetrov D P. Structured Bayesian pruning via log-normal multiplicative noise. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017, 6775–6784

[30]

Dai B, Zhu C, Guo B, Wipf D. Compressing neural networks using the variational information bottleneck. In: Proceedings of the International Conference on Machine Learning. 2018, 1143–1152

[31]

Louizos C, Ullrich K, Welling M. Bayesian compression for deep learning. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017, 3290–3300

[32]

Karaletsos T, Rätsch G. Automatic relevance determination for deep generative models. 2015, arXiv preprint arXiv:1505.07765

[33]

Chatzis S P. Sparse Bayesian recurrent neural networks. In: Proceedings of Joint European Conference on Machine Learning and Knowledge Discovery in Databases. 2015, 359–372

[34]

Krizhevsky A, Hinton G. Learning multiple layers of features from tiny images. Technical Report, Citeseer, 2009

[35]

LeCun Y, Bottou L, Bengio Y, Haffner P. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 1998, 86(11): 2278–2324

[36]

He K, Zhang X, Ren S, Sun J. Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE International Conference on Computer Vision. 2015, 1026–1034

[37]

Kingma D P, Ba J. Adam: a method for stochastic optimization. 2014, arXiv preprint arXiv:1412.6980

[38]

Dong X, Huang J, Yang Y, Yan S. More is less: a more complicated network with less inference complexity. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017, 5840–5848

[39]

He Y, Zhang X, Sun J. Channel pruning for accelerating very deep neural networks. In: Proceedings of the IEEE International Conference on Computer Vision. 2017

[40]

He Y, Kang G, Dong X, Fu Y, Yang Y. Soft filter pruning for accelerating deep convolutional neural networks. In: Proceedings of International Joint Conference on Artificial Intelligence. 2018, 2234–2240

[41]

Alemi A A, Fischer I, Dillon J V, Murphy K. Deep variational information bottleneck. 2016, arXiv preprint arXiv:1612.00410

RIGHTS & PERMISSIONS

Higher Education Press and Springer-Verlag GmbH Germany, part of Springer Nature

AI Summary AI Mindmap
PDF (473KB)

Supplementary files

FCS-0004-18390-CL_suppl_1

1281

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/