Newton design:designing CNNs with the family of Newton's methods
作者机构:School of Mathematical SciencesPeking University Bytedance AI Lab JD Explore Academy Key Laboratory of Machine PerceptionSchool of Intelligence Science and TechnologyPeking University Pazhou Lab
出 版 物:《Science China(Information Sciences)》 (中国科学:信息科学(英文版))
年 卷 期:2023年第66卷第6期
页 面:122-137页
核心收录:
学科分类:12[管理学] 1201[管理学-管理科学与工程(可授管理学、工学学位)] 081104[工学-模式识别与智能系统] 08[工学] 0835[工学-软件工程] 0811[工学-控制科学与工程] 0812[工学-计算机科学与技术(可授工学、理学学位)]
基 金:supported by National Key R&D Program of China (Grant No. 2022ZD0160302) Major Key Project of PCL, China (Grant No. PCL2021A12) National Natural Science Foundation of China (Grant No. 62276004)
主 题:CNN dropout optimization method network design Newton's method
摘 要:Nowadays, convolutional neural networks(CNNs)have led the developments of machine ***, most CNN architectures are obtained by manual design, which is empirical, time-consuming, and non-transparent. In this paper, we aim at offering better insight into CNN models from the perspective of optimization theory. We propose a unified framework for understanding and designing CNN architectures with the family of Newton’s methods, which is referred to as Newton design. Specifically, we observe that the standard feedforward CNN model(PlainNet)solves an optimization problem via a kind of quasi-Newton method. Interestingly, residual network(ResNet)can also be derived if we use a more general quasi-Newton method to solve this problem. Based on the above observations, we solve this problem via a better method,the Newton-conjugate-gradient(Newton-CG)method, which inspires Newton-CGNet. In the network design,we translate binary-value terms in the optimization schemes to dropout layers, so dropout modules naturally appear in the derived CNN structures with specific locations, rather than being an empirical training *** experiments on image classification and text categorization tasks verify that Newton-CGNets perform very competitively. Particularly, Newton-CGNets surpass their counterparts ResNets by over 4% on CIFAR-10 and over 10% on CIFAR-100, respectively.