A Consensus-Based Global Optimization Method with Adaptive Momentum Estimation
作者机构:School of Mathematical SciencesUniversity of Science and Technology of ChinaHefeiAnhui 230026China Suzhou Institute for Advanced ResearchUniversity of Science and Technology of ChinaSuzhouJiangsu 215123China School of Mathematical SciencesInstitute of Natural SciencesMOE-LSCShanghai Jiao Tong UniversityShanghai200240China Department of Computational MathematicsScienceand EngineeringMichigan State UniversityEast LansingMI48824USA
出 版 物:《Communications in Computational Physics》 (计算物理通讯(英文))
年 卷 期:2022年第31卷第4期
页 面:1296-1316页
核心收录:
学科分类:07[理学] 0701[理学-数学] 070101[理学-基础数学]
基 金:J.Chen was supported by National Natural Science Foundation of China via grant 11971021 S.Jin was supported by Natural Science Foundation of China under grant 12031013
主 题:Consensus-based optimization global optimization machine learning curse of dimensionality
摘 要:Objective functions in large-scalemachine-learning and artificial intelligence applications often live in high dimensions with strong non-convexity and massive local ***-based methods,such as the stochastic gradient method and Adam[15],and gradient-freemethods,such as the consensus-based optimization(CBO)method,can be employed to find *** this work,based on the CBO method and Adam,we propose a consensus-based global optimization method with adaptive momentum estimation(Adam-CBO).Advantages of the Adam-CBO method include:It is capable of finding global minima of non-convex objective functions with high success rates and low *** is verified by finding the global minimizer of the 1000 dimensional Rastrigin function with 100%success rate at a cost only growing linearly with respect to the *** can handle non-differentiable activation functions and thus approximate lowregularity functions with better *** is confirmed by solving a machine learning task for partial differential equations with low-regularity solutions where the Adam-CBO method provides better results than *** is robust in the sense that its convergence is insensitive to the learning rate by a linear stability *** is confirmed by finding theminimizer of a quadratic function.