STOCHASTIC TRAINING OF RESIDUAL NETWORKS IN DEEP LEARNING
Sun Qi1, Tao Yunzhe2, Du Qiang3
1 Beijing International Center for Mathematical Research, Peking University, Beijing 100871, China;
2 Amazon Web Services Artificial Intelligence, Seattle, WA 98121, USA;
3 Department of Applied Physics and Applied Mathematics, Columbia University, New York, NY 10027, USA
Stochastic training of artificial neural networks used for improving the capability of deep learning has attracted much attention in recent years. With prescribed model parameters for the network structure, the method of modified equations can be adopted to understand the intrinsic features of stochastic training approach. It illustrates that residual networks and their variants with noise injection used in deep learning can be regarded as weak approximations of stochastic differential equations. Such connections allow us to bring together the stochastic training processes and the optimal control of backward Kolmogorov's equations, hence offering new understanding to the regularization effects on the loss landscape from the differential equation and optimal control perspective and further suggesting reliable, efficient and explainable stochastic training strategies. A binary classification task based on the Bernoulli dropout within the residual network architecture is used here as an example to illustrate and substantiate our theoretical claims.