• 论文 • 上一篇    

稀疏优化二阶算法研究进展

王锐1, 修乃华2   

  1. 1. 北京大学北京国际数学研究中心, 北京 100871;
    2. 北京交通大学理学院数学系, 北京 100044
  • 收稿日期:2022-05-31 发布日期:2022-09-09
  • 基金资助:
    国家自然科学基金(11971052,12131004)和北京自然科学基金(Z190002)资助.

王锐, 修乃华. 稀疏优化二阶算法研究进展[J]. 数值计算与计算机应用, 2022, 43(3): 314-328.

Wang Rui, Xiu Naihua. PROGRESS OF SECOND-ORDER METHODS FOR SPARSE OPTIMIZATION[J]. Journal on Numerica Methods and Computer Applications, 2022, 43(3): 314-328.

PROGRESS OF SECOND-ORDER METHODS FOR SPARSE OPTIMIZATION

Wang Rui1, Xiu Naihua2   

  1. 1. Beijing International Center for Mathematical Research, Peking University, Beijing 100871, China;
    2. Department of Mathematics, School of Science, Beijing Jiaotong University, Beijing 100044, China
  • Received:2022-05-31 Published:2022-09-09
稀疏优化是最优化学科近十余年发展起来的一个崭新分支.它主要研究带有稀疏结构特征的优化问题,在大数据分析与处理、机器学习与人工智能、信号与图像处理、生物信息学等学科领域有广泛应用.然而,稀疏优化属于非凸非连续优化,是一个NP-难问题.一直以来,人们通常采用一阶算法求解大规模稀疏优化问题,并取得了丰富成果.为提高计算速度和求解精度,人们近几年创新发展了若干计算花费少的二阶算法,本文主要介绍与评述其研究进展,奉献给读者.
Sparse optimization developed in the past ten years is a new branch of the optimization discipline. It mainly studies optimization problems with sparse structure characteristics, and is widely applied to many fields including machine learning and artificial intelligence, signal and image processing, bioinformatics. However, sparse optimization problem is nonconvex, noncontinuous and even NP-hard. First-order algorithms are commonly used to solve largescale sparse optimization problems, and have achieved rich results. In order to improve the calculation speed and solution accuracy, a number of second-order algorithms with low computational cost have been developed in recent years. This paper mainly introduces its research progress and is dedicated to readers.

MR(2010)主题分类: 

()
[1] Candès E J. Compressive sampling[C]. In Proceedings of the International Congress of Mathematicians, 2006.
[2] Candès E J, Romberg and J, Tao T. Robust uncertainty principles:exact signal reconstruction from highly incomplete frequency information[J]. IEEE Transactions on Information Theory, 2006, 52(2):489-509.
[3] Candès E J and Tao T. Decoding by linear programming[J]. IEEE Transactions on Information Theory, 2005, 51(12):4203-4215.
[4] Donoho D L. Compressed sensing[J]. IEEE Transactions on Information Theory, 2006, 52(4):1289-1306.
[5] Beck A. First-Order Methods in Optimization, SIAM, Philadelphia, PA, 2017.
[6] Eldar Y C and Kutyniok G. Compressed Sensing:Theory and Applications[M]. Cambridge University Press, 2012.
[7] Fan J, Li R, Zhang C and Zou H. Statistical Foundations of Data Science. CRC Press, Taylor and Francis, FL, 2020.
[8] Rish I and Grabarnik G Y. Sparse Modeling:Theory, Algorithms, and Applications[M]. CRC Press, Inc. 2014.
[9] Hastie T, Tibshirani R and Wainwright M. Statistical learning with sparsity:The lasso and generalizations. CRC Press, 2015.
[10] Zhao Y. Sparse optimization theory and methods[M]. CRC Press, 2018.
[11] Geer, S V D. Estimation and testing under sparsity[M]. Springer International Publishing, 2016.
[12] 文再文, 印卧涛, 刘歆, 张寅. 压缩感知和稀疏优化简介[J]. 运筹学学报, 2012, 16(3):49-64.
[13] 许志强. 压缩感知[J]. 中国科学, 2012, 42(9):865-877.
[14] 于春梅. 稀疏优化算法综述[J]. 计算机工程与应用, 2014, 50(11):210-217.
[15] 赵晨, 罗自炎, 修乃华. 稀疏优化理论与算法若干新进展[J]. 运筹学学报, 2020, 24(4):1-24.
[16] Song J, Li J, Yao Z, Ma K and Bao C. Zero Norm based Analysis Model for Image Smoothing and Reconstruction[J]. Inverse Problems, 2020, 36(11):115009.
[17] Tillmann A M, Bienstock D, Lodi A and Schwartz A. Cardinality minimization, constraints, and regularization:a survey. arXiv:2106.09606, 2021.
[18] Robert T. Regression shrinkage and selection via the lasso[J]. Journal of the Royal Statistical Society. Series B (Methodological), 1996, 58(1):267-288.
[19] Kim S J, Koh K, Lustig M, Boyd S and Gorinevsky D. An interior-point method for large-scale l1-regularized least squares[J]. IEEE Journal of Selected Topics in Signal Processing, 2007, 1(4):606-617.
[20] Zou H. The adaptive lasso and its oracle properties[J]. Journal of the American Statistical Association, 2006, 101(476):1418-1429.
[21] Zou H and Hastie T. Regularization and variable selection via the elastic net[J]. Journal of the Royal Statistical Society:Series B (Statistical Methodology), 2005, 67:301-320.
[22] Huang J, Ma S, Xie H and Zhang C. A group bridge approach for variable selection[J]. Biometrika, 2009, 96(2):339-355.
[23] Nikolova M, Ng M, Zhang S and Ching W K. Efficient reconstruction of piecewise constant images using nonsmooth nonconvex minimization[J]. SIAM journal on Imaging Sciences, 2008, 1(1):2-25.
[24] Fan J and Li R. Variable selection via nonconcave penalized likelihood and its oracle properties[J]. Journal of the American statistical Association, 2001, 96(456):1348-1360.
[25] Zhang C. Nearly unbiased variable selection under minimax concave penalty[J]. The Annals of Statistics, 2010, 38(2):894-942.
[26] Ong C and An Le T. Learning sparse classifiers with difference of convex functions algorithms[J]. Optimization Methods and Software, 2013, 28(4):830-854.
[27] Zhang T. Analysis of multi-stage convex relaxation for sparse regularization[J]. Journal of Machine Learning Research, 2010, 11(35):1081-1107.
[28] Soubies E, Blanc-Féraud L and Aubert G. A continuous exact l0 penalty (CEL0) for least squares regularized problem[J]. SIAM Journal on Imaging Sciences, 2015, 8(3):1607-1639.
[29] Krishnapuram B, Carin L, Figueiredo M A T and Hartemink A J. Sparse multinomial logistic regression:Fast algorithms and generalization bounds[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005, 6(27):957-968.
[30] Figueiredo M A T. Adaptive sparseness for supervised learning[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2003, 25(9):1150-1159.
[31] Andrew G and Gao J. Scalable training of l1-regularized log-linear models[J]. In Proceedings of the 24th International Conference on Machine Learning, 2007, 33-40.
[32] Yu J, Vishwanathan S V N, Gunter S and Schraudolph N N. A quasi-Newton approach to nonsmooth convex optimization problems in machine learning[J]. Journal of Machine Learning Research, 2010, 11(3):1145-1200.
[33] Hsieh C, Sustik M, Dhillon, I and Ravikumar P. Sparse inverse covariance matrix estimation using quadratic approximation[J]. In Proceedings of the Advances in Neural Information Processing Systems, 2011, 24:2330-2338.
[34] Olsen P, Oztoprak F, Nocedal J and Rennie S. Newton-like methods for sparse inverse covariance estimation[J]. In Advances in Neural Information Processing Systems, 2012, 25:755-763.
[35] Li X, Sun D and Toh K C. A highly efficient semismooth Newton augmented Lagrangian method for solving lasso problems[J]. SIAM Journal on Optimization, 2016, 28(1):433-458.
[36] Zhang Y, Zhang N, Sun D and Toh K C. An efficient hessian based algorithm for solving large-scale sparse group lasso problems[J]. Mathematical Programming, 2020, 179:223-263.
[37] Luo Z, Sun D, Toh K C and Xiu N. Solving the OSCAR and SLOPE models using a semismooth Newton-based augmented Lagrangian method[J]. Journal of Machine Learning Research, 2019, 20(106):1-25.
[38] Zhang Y, Zhang N, Sun D and Toh K C. A proximal point dual Newton algorithm for solving group graphical lasso problems[J]. SIAM Journal on Optimization, 2020, 30(3):2197-2220.
[39] Li X, Sun D and Toh K-C. On efficiently solving the subproblems of a level-set method for fused lasso problems[J]. SIAM Journal on Optimization, 2018, 28(2):1842-1866.
[40] Koh K, Kim S-J and Boyd S. An interior-point method for large-scale l1-regularized logistic regression[J]. Journal of Machine Learning Research, 2007, 8(7):1519-1555.
[41] Shi J, Yin W, Osher S and Sajda P. A fast hybrid algorithm for large-scale l1-regularized logistic regression[J]. Journal of Machine Learning Research, 2010, 11(2):713-741.
[42] Yuan G, Chang K, Hsieh C J and Lin C J. A comparison of optimization methods and software for large-scale l1-regularized linear classification[J]. Journal of Machine Learning Research, 2010, 11(11):3183-3234.
[43] Friedman J, Hastie T and Tibshirani R. Regularization Paths for Generalized Linear Models via Coordinate Descent[J]. Journal of Statistical Software, 2010, 33(1):1-22.
[44] Lee J, Sun Y and Saunders M. Proximal Newton-type methods for minimizing composite functions[J]. SIAM Journal on Optimization, 2014, 24:1420-1443.
[45] Schmidt M. Graphical model structure learning with l1-regularization[J]. PhD thesis, University of British Columbia, 2010.
[46] Schmidt M, Kim D and Sra S. Projected Newton-type methods in machine learning. Optimization for Machine Learning. MIT Press, 2011.
[47] Becker S and Fadili J. A quasi-Newton proximal splitting method[J]. In Advances in Neural Information Processing Systems, 2012, 25:2618-2626.
[48] Shi Z and Liu R. Large scale optimization with proximal stochastic Newton-type gradient descent[J]. Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer International Publishing, 2015, 691-704.
[49] Milzarek A, Xiao X, Cen S, Wen Z and Ulbrich M. A Stochastic Semismooth Newton Method for Nonsmooth Nonconvex Optimization[J]. SIAM Journal on Optimization, 2019, 29(4):2916-2948.
[50] Ito K and Kunisch K. A variational approach to sparsity optimization based on Lagrange multiplier theory[J]. Inverse Problems, 2014, 30(1):015001.
[51] Jiao Y, Jin B and Lu X. A primal dual active set with continuation algorithm for the l0-regularized optimization problem[J]. Applied and Computational Harmonic Analysis, 2015, 39(3):400-426.
[52] Huang J, Jiao Y, Liu Y and Lu X. A constructive approach to l0 penalized regression[J]. Journal of Machine Learning Research, 2018, 19:1-37.
[53] Zhou S, Pan L and Xiu N. Newton method for the l0-regularized optimization. Numerical Algorithms, https://doi.org/10.1007/s11075-021-01085-x, 2021.
[54] Chen X, Niu L and Yuan Y. Optimality conditions and a smoothing trust region Newton method for non-Lipschitz optimization[J]. SIAM Journal on Optimization, 2013, 23(3):1528-1552.
[55] Bian W, Chen X and Ye Y. Complexity analysis of interior point algorithms for non-lipschitz and nonconvex minimization[J]. Mathematical Programming, 2015, 149:301-327.
[56] Gong P and Ye J. Honor:Hybrid optimization for non-convex regularized problems. In Advances in Neural Information Processing Systems, 2015, 415-423.
[57] Rakotomamonjy A, Flamary R and Gasso G. DC proximal Newton for nonconvex optimization problems[J]. IEEE Transactions on Neural Networks and Learning Systems, 2016, 27(3):636-647.
[58] Bahmani S, Raj B and Boufounos P. Greedy sparsity-constrained optimization[J]. Journal of Machine Learning Research, 2013, 14:807-841.
[59] Chen J and Gu Q. Fast Newton hard thresholding pursuit for sparsity constrained non-convex optimization. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2017, 757-766.
[60] Yuan X and Liu Q. Newton-type greedy selection methods for l0-constrained minimization[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(12):2437-2450.
[61] Wang R, Xiu N and Zhang C. Greedy projected gradient-Newton method for sparse logistic regression[J]. IEEE Transactions on Neural Networks and Learning Systems, 2020, 31(2):527-538.
[62] Zhou S, Xiu N and Qi H. Global and quadratic convergence of Newton hard-thresholding pursuit[J]. Journal of Machine Learning Research, 2019, 22(12):1-45.
[63] Wang R, Xiu N and Toh K C. Subspace quadratic regularization method for group sparse multinomial logistic regression[J]. Computational Optimization and Applications, 2021, 79:531-559.
[64] Zhao C, Xiu N, Qi H and Luo Z. A Lagrange-Newton algorithm for sparse nonlinear programming. arXiv:2004.13257, 2020.
[1] 魏雪丹, 戴厚平, 李梦军, 郑洲顺. 空间分数阶对流扩散方程的格子Boltzmann方法[J]. 数值计算与计算机应用, 2022, 43(3): 270-280.
[2] 李旭, 李瑞丰. 求解一类复对称线性系统的广义AOR迭代法[J]. 数值计算与计算机应用, 2022, 43(3): 295-306.
[3] 张旭, 蒋艳群, 陈勋, 胡迎港. 粘性Burgers方程的高阶隐式WCNS格式[J]. 数值计算与计算机应用, 2022, 43(2): 188-201.
[4] 何鼎, 胡鹏. 求解随机常微分方程的平均单支$\theta-$方法[J]. 数值计算与计算机应用, 2022, 43(1): 49-60.
[5] 王丽. 三类新的求解广义最小二乘问题的预处理GAOR方法[J]. 数值计算与计算机应用, 2020, 41(4): 282-296.
[6] 邱泽山, 曹学年. 带漂移的单侧正规化回火分数阶扩散方程的三阶数值格式[J]. 数值计算与计算机应用, 2020, 41(3): 201-215.
[7] 关文绘, 曹学年. Riesz回火分数阶平流-扩散方程的隐式中点方法[J]. 数值计算与计算机应用, 2020, 41(1): 42-57.
[8] 陈星玎, 李思雨. 求解PageRank的多步幂法修正的广义二级分裂迭代法[J]. 数值计算与计算机应用, 2018, 39(4): 243-252.
[9] 毛文亭, 张维, 王文强. 一类带乘性噪声随机分数阶微分方程数值方法的弱收敛性与弱稳定性[J]. 数值计算与计算机应用, 2018, 39(3): 161-171.
[10] 张维, 王文强. 随机延迟微分方程分裂步单支θ方法的强收敛性[J]. 数值计算与计算机应用, 2018, 39(2): 135-149.
[11] 赵志华, 徐凤敏, 袁晓玲. 增强型指数的稀疏鲁棒优化模型及其实证分析[J]. 数值计算与计算机应用, 2016, 37(3): 199-210.
[12] 张纯, 蔡邢菊, 韩德仁. 求鞍点问题的新的原始-对偶算法[J]. 数值计算与计算机应用, 2016, 37(3): 167-178.
[13] 王丽平, 吴亚飞. 联合稀疏独立法则及其在疾病分类中的应用[J]. 数值计算与计算机应用, 2016, 37(3): 179-185.
[14] 谢胜兰, 祝鹏. 奇异摄动问题 FEM/LDG 耦合方法的最优阶一致收敛性分析[J]. 数值计算与计算机应用, 2014, 35(3): 189-205.
[15] 王文强, 孙晓莉. 一类随机分数阶微分方程隐式Euler方法的弱收敛性与弱稳定性[J]. 数值计算与计算机应用, 2014, 35(2): 153-162.
阅读次数
全文


摘要