• 青年评述 •

### 复合凸优化的快速邻近点算法

1. 复旦大学大数据学院, 上海数学中心, 上海 200433
• 收稿日期:2020-09-28 出版日期:2020-11-15 发布日期:2020-11-15
• 作者简介:郦旭东,复旦大学大数据学院青年研究员,上海数学中心青年研究员.2010年本科毕业于中国科学技术大学数学系,2015年在新加坡国立大学数学系获博士学位.博士毕业后曾在新加坡国立大学数学系与美国普林斯顿大学运筹与金融工程系任博士后研究员,2018年9月入职复旦大学,于2019年获得由国际数学优化协会(Mathematical Optimization Society)所颁发的连续优化青年学者最佳论文奖,2020年入选第五届中国科协青年人才托举工程,现任期刊《Mathematical Programming Computation》编委.
• 基金资助:

国家自然科学基金（11901107），中国科协青年人才托举工程（2019QNRC001），上海市扬帆计划（19YF1402600），上海市科委项目（19511120700）资助.

Li Xudong. EFFICIENT PROXIMAL POINT ALGORITHM FOR CONVEX COMPOSITE OPTIMIZATION[J]. Mathematica Numerica Sinica, 2020, 42(4): 385-404.

### EFFICIENT PROXIMAL POINT ALGORITHM FOR CONVEX COMPOSITE OPTIMIZATION

Li Xudong

1. School of Data Science, and Shanghai Center for Mathematical Sciences, Fudan University, Shanghai 200433, China
• Received:2020-09-28 Online:2020-11-15 Published:2020-11-15

In the Big Data era, with the advent of convenient automated data collection technologies, large-scale composite convex optimization problems are ubiquitous in many applications, such as massive data analysis, machine and statistical learning, image and signal processing. In this paper, we review a class of efficient proximal point algorithms for solving the large-scale composite convex optimization problems. Under the easy-to-implement stopping criteria and mild calmness conditions, we show the proximal point algorithm enjoys global and local asymptotic superlinear convergence. Meanwhile, based on the duality theory, we propose an efficient semismooth Newton method for handling the subproblems in the proximal point algorithm. Lastly, to further accelerate the proximal point algorithm, we fully exploit the nonsmooth second order information induced by the nonsmooth regularizer in the problem to achieve a dramatic reduction of the computational costs of solving the involved semismooth Newton linear systems.

MR(2010)主题分类:

()
 [1] Beck A and Teboulle M. A fast iterative shrinkage-thresholding algorithm for linear inverse problems[J]. SIAM Journal on Imaging Sciences, 2009, 2:183-202.[2] Benjamin R, Fazel M and Parrilo P A. Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization[J]. SIAM Review, 2010, 52:471-501.[3] Bertsekas D P. Nonlinear Programming[M], Athena Scientific, 1999.[4] Best M J and Chakravarti N. Active set algorithms for isotonic regression; a unifying framework[J]. Mathematical Programming, 1990, 47:425-439.[5] Bogdan M, van den Berg E, Sabatti C, Su W and Candès E J. SLOPE-adaptive variable selection via convex optimization[J]. Annals of Applied Statistics, 2015, 9:1103-1140.[6] Bondell H D and Reich B J. Simultaneous regression shrinkage, variable selection, and supervised clustering of predictors with OSCAR[J]. Biometrics, 2008, 64:115-123.[7] Bauschke H H and Combettes P L. Convex Analysis and Monotone Operator Theory in Hilbert Spaces[M]. Springer, New York, 2011.[8] Clarke F H. Optimization and Nonsmooth Analysis[M]. SIAM, 1990.[9] Condat L. A direct algorithm for 1-D total variation denoising[J]. IEEE Signal Processing Letters, 2013, 20:1054-1057.[10] Cui Y, Ding C and Zhao X Y. Quadratic growth conditions for convex matrix optimization problems associated with spectral functions[J]. SIAM Journal on Optimization, 2017, 27:2332-2355.[11] Cui Y, Sun D F and Toh K C. On the R-superlinear convergence of the KKT residuals generated by the augmented Lagrangian method for convex composite conic programming[J]. Mathematical Programming, 2019, 178:381-415.[12] Ding C. An Introduction to a Class of Matrix Optimization Problems[D]. Ph.D Thesis, Department of Mathematics, National University of Singapore, 2012.[13] Ding C, Sun D F and Toh K C. An introduction to a class of matrix cone programming[J]. Mathematical Programming, 2014, 144:141-179.[14] Dontchev A L and Rockafellar R T. Implicit Functions and Solution Mappings[M]. Springer, New York, 2009.[15] Fischer A. Local behavior of an iterative framework for generalized equations with nonisolated solutions[J]. Mathematical Programming, 2002, 94:91-124.[16] Facchinei F, Fischer A and Herrich M. An LP-Newton method:nonsmooth equations, KKT systems, and nonisolated solutions[J]. Mathematical Programming, 2014, 146:1-36.[17] Facchinei F and Pang J S. Finite-Dimensional Variational Inequalities and Complementarity Problems[M]. Springer, New York, 2003.[18] Friedman J, Hastie T, Hofling H and Tibshirani R. Pathwise coordinate optimization[J]. The annals of applied statistics, 2007, 1:302-332.[19] Gabay D and Mercier B. A dual algorithm for the solution of nonlinear variational problems via finite element approximation[J]. Computers & Mathematics with Applications, 1976, 2:17-40.[20] Glowinski R and Marroco A. Sur l'approximation, par éléments finis d'ordre un, et la résolution, par pénalisation-dualité d'une classe de problèmes de Dirichlet non linéaires[J]. Revue française d'automatique, informatique, recherche opérationnelle. Analyse numérique, 1975, 9:41-76.[21] Golub G and Van Loan C F. Matrix Computations[M]. 3nd ed., Johns Hopkins University Press, Baltimore, MD, 1996.[22] Goebel R and Rockafellar R T. Local strong convexity and local Lipschitz continuity of the gradient of convex functions[J]. Journal of Convex Analysis, 2008, 15:263-270.[23] Hiriart-Urruty J B, Strodiot J J and Nguyen V H. Generalized Hessian matrix and second-order optimality conditions for problems with C1,1 data[J]. Applied Mathematics and Optimization, 1984, 11:43-56.[24] Kummer B, Newton's method for non-differentiable functions[J]. Advances in Mathematical Optimization, 1988, 45:114-125.[25] Lee J D, Sun Y and Saunders M A. Proximal Newton-type methods for minimizing composite functions[J]. SIAM Journal on Optimization, 2014, 24:1420-1443.[26] Leventhal D. Metric subregularity and the proximal point method[J]. Journal of Mathematical Analysis and Applications, 2009, 360:681-688.[27] Li X D, Sun D F and Toh K C. A Schur complement based semi-proximal ADMM for convex quadratic conic programming and extensions[J]. Mathematical Programming, 2016, 155:333-373.[28] Li X D, Sun D F and Toh K C. QSDPNAL:A two-phase augmented Lagrangian method for convex quadratic semidefinite programming[J]. Mathematical Programming Computation, 2018, 10:703-743.[29] Li X D, Sun D F and Toh K C. A highly efficient semismooth Newton augmented Lagrangian method for solving Lasso problems[J]. SIAM Journal on Optimization, 2018, 28:433-458.[30] Li X D, Sun D F and Toh K C. On efficiently solving the subproblems of a level-set method for fused Lasso problems[J]. SIAM Journal on Optimization, 2018, 28:1842-1866.[31] Li X D, Sun D F and Toh K C. On the efficient computation of a generalized Jacobian of the projector over the Birkhoff polytope[J]. Mathematical Programming, 2020, 178:419-446.[32] Li X D, Sun D F and Toh K C, An asymptotically superlinearly convergent semismooth Newton augmented Lagrangian method for Linear Programming[J]. SIAM Journal on Optimization, 2020, 30:2410-2440.[33] Li G Y and Mordukhovich B S. Hölder metric subregularity with applications to proximal point method[J]. SIAM Journal on Optimization, 2012, 22:1655-1684.[34] Lin M, Liu Y J, Sun D and Toh K C. Efficient sparse hessian based algorithms for the clustered lasso problem[J]. SIAM Journal on Optimization, 2019, 29:2026-2052.[35] Lin M, Sun D F, Toh K C and Yuan Y. A dual Newton based preconditioned proximal point algorithm for exclusive lasso models, arXiv:1902.00151, 2019.[36] Luo Z Q and Tseng P. On the linear convergence of descent methods for convex essentially smooth minimization[J]. SIAM J. Control and Optimization, 1992, 30:408-425.[37] Luo Z Q and Tseng P. Error bounds and convergence analysis of feasible descent methods:a general approach[J]. Annals of Operations Research, 1993, 46:157-178.[38] Luo Z, Sun D F, Toh K C and Xiu N. Solving the OSCAR and SLOPE models using a semismooth Newton-based augmented Lagrangian method[J]. Journal of Machine Learning Research, 2019, 20:1-25.[39] Luque F J. Asymptotic convergence analysis of the proximal point algorithm[J]. SIAM Journal on Control and Optimization, 1984, 22:277-293.[40] Mifflin R. Semismooth and semiconvex functions in constrained optimization[J]. SIAM Journal on Control and Optimization, 1977, 15:959-972.[41] Mordukhovich B S and Ouyang W. Higher-order metric subregularity and its applications[J]. Journal of Global Optimization, 2015, 63:777-795.[42] Nesterov Y. A method of solving a convex programming problem with convergence rate O(1/k2)[J]. Soviet Mathematics Doklady, 1983, 27:372-376.[43] Qi H and Sun D F. A quadratically convergent Newton method for computing the nearest correlation matrix[J]. SIAM Journal on Matrix Analysis and Applications, 2006, 28:360-385.[44] Qi L and Sun J. A nonsmooth version of Newton's method[J]. Mathematical Programming, 1993, 58:353-367.[45] Robinson S M. An implicit-function theorem for generalized variational inequalties. Technical Summary Report No. 1672, Mathematics Research Center, University of Wisconsin-Madison, 1976; available from National Technical Information Service under Accession No. ADA031952.[46] Robinson S M. Some continuity properties of polyhedral multifunctions, In Mathematical Programming at Oberwolfach, vol. 14 of Mathematical Programming Studies, Springer, Berlin, Heidelberg, 1981, 206-214.[47] Rockafellar R T. Convex Analysis[M]. Princeton University Press, 1970.[48] Rockafellar R T. Monotone operators and the proximal point algorithm[J]. SIAM Journal on Control and Optimization, 1976, 14:877-898.[49] Rockafellar R T. Augmented Lagrangians and applications of the proximal point algorithm in convex programming[J]. Mathematics of Operations Research, 1976, 1:97-116.[50] Rockafellar R T and Wets R J B. Variational Analysis[M]. Springer, New York, 1998.[51] She Y. Sparse regression with exact clustering[J]. Electronic Journal of Statistics, 2010, 4:1055-1096.[52] Sun D F and Sun J. Semismooth matrix-valued functions[J]. Mathematics of Operations Research, 2002, 27:150-169.[53] Sun J. On Monotropic Piecewise Qudratic Programming[D]. Ph.D Thesis, Department of Mathematics, University of Washington, Seattle, 1986.[54] Tang P, Wang C, Sun D F and Toh K C. A sparse semismooth Newton based proximal majorization-minimization algorithm for nonconvex square-root-loss regression problems. Journal of Machine Learning Research, in print, 2020.[55] Tibshirani R. Regression shrinkage and selection via the lasso[J]. Journal of the Royal Statistical Society:Series B, 1996, 58:267-288.[56] Tibshirani R, Saunders M, Rosset S, Zhu J and Knight K. Sparsity and smoothness via the fused lasso[M]. Journal of the Royal Statistical Society:Series B, 2005, 67:91-108.[57] Tseng P and Yun S. A coordinate gradient descent method for nonsmooth separable minimization[J]. Mathematical Programming, 2010, 125:387-423.[58] Tseng P. Approximation accuracy, gradient methods, and error bound for structured convex optimization[J]. Mathematical Programming, 2010, 125:263-295.[59] Xiao X, Li Y, Wen Z and Zhang L. A regularized semi-smooth Newton method with projection steps for composite convex programs[J]. Journal of Scientific Computing, 2018, 76:364-389.[60] Yang L Q, Sun D F and Toh K C. SDPNAL+:A majorized semismooth Newton-CG augmented Lagrangian method for semidefinite programming with nonnegative constraints[J]. Mathematical Programming Computation, 2015, 7:331-366.[61] Yu Y. On decomposing the proximal map, in Advances in Neural Information Processing Systems, 2013, 91-99.[62] Yuan M and Lin Y. Model selection and estimation in regression with grouped variables[J]. Journal of the Royal Statistical Society:Series B, 2006, 68:49-67.[63] Yue M X, Zhou Z and So A M C. A family of inexact SQA methods for non-smooth convex minimization with provable convergence guarantees based on the Luo-Tseng error bound property[J]. Mathematical Programming, 2019, 174:327-358.[64] Zhao X Y, Sun D F and Toh K C. A Newton-CG augmented Lagrangian method for semidefinite programming[J]. SIAM Journal on Optimization, 2010, 20:1737-1765.[65] Zhang Y J, Zhang N, Sun D F and Toh K C. An efficient Hessian based algorithm for solving large-scale sparse group Lasso problems[J]. Mathematical Programming, 2020, 179, 223-263.[66] Zhou Z R and So A M C. A unified approach to error bounds for structured convex optimization problems[J]. Mathematical Programming, 2017, 165:689-728.[67] Zou H and Hastie T. Regularization and variable selection via the elastic net[J]. Journal of the Royal Statistical Society:Series B, 2005, 67:301-320.
 [1] 申远, 李倩倩, 吴坚. 一种基于邻近点算法的变步长原始-对偶算法[J]. 计算数学, 2018, 40(1): 85-95. [2] 徐海文, 孙黎明. 一类凸优化的加速混合下降算法[J]. 计算数学, 2017, 39(2): 200-212. [3] 徐海文. 一类凸优化的混合下降算法[J]. 计算数学, 2012, 34(1): 93-102.