Optimization-inspired manual architecture design and neural architecture search
作者机构:JD Explore Academy School of Mathematical Sciences Peking University Institute of Robotics and Automatic Information Systems College of Artificial Intelligence Nankai University Key Lab of General Artificial Intelligence School of Intelligence Science and Technology Peking University Institute for Artificial Intelligence Peking University Pazhou Lab
出 版 物:《Science China(Information Sciences)》 (中国科学:信息科学(英文版))
年 卷 期:2023年第66卷第11期
页 面:100-112页
核心收录:
学科分类:12[管理学] 1201[管理学-管理科学与工程(可授管理学、工学学位)] 081104[工学-模式识别与智能系统] 08[工学] 080203[工学-机械设计及理论] 0835[工学-软件工程] 0802[工学-机械工程] 0811[工学-控制科学与工程] 0812[工学-计算机科学与技术(可授工学、理学学位)]
基 金:supported by National Key R&D Program of China (Grant No. 2022ZD0160302) National Natural Science Foundation of China (Grant No. 62276004)
主 题:deep neural network manual architecture design neural architecture search image recognition optimization algorithms learning-based optimization
摘 要:Neural architecture has been a research focus in recent years due to its importance in deciding the performance of deep networks. Representative ones include a residual network(Res Net) with skip connections and a dense network(Dense Net) with dense connections. However, a theoretical guidance for manual architecture design and neural architecture search(NAS) is still lacking. In this paper, we propose a manual architecture design framework, which is inspired by optimization algorithms. It is based on the conjecture that an optimization algorithm with a good convergence rate may imply a neural architecture with good performance. Concretely, we prove under certain conditions that forward propagation in a deep neural network is equivalent to the iterative optimization procedure of the gradient descent algorithm minimizing a cost function. Inspired by this correspondence, we derive neural architectures from fast optimization algorithms,including the heavy ball algorithm and Nesterov’s accelerated gradient descent algorithm. Surprisingly, we find that we can deem the Res Net and Dense Net as special cases of the optimization-inspired *** architectures offer not only theoretical guidance, but also good performances in image recognition on multiple datasets, including CIFAR-10, CIFAR-100, and Image Net. Moreover, we show that our method is also useful for NAS by offering a good initial search point or guiding the search space.